The Art of Unix Programming

Revision History
Revision 0.01999esr
Public HTML draft, first four chapters only.
Revision 0.116 November 2002esr
First DocBook draft, fifteen chapters. Released to Mark Taub at AW.
Revision 0.22 January 2003esr
First manuscript walkthrough at Chapter 7. Released to Dmitry Kirsanov at AW production.
Revision 0.322 January 2003esr
First eighteen-chapter draft. Manuscript walkthrough at Chapter 12. Limited release for early reviewers.
Revision 0.45 February 2003esr
Release for public review.
Revision 0.4111 February 2003esr
Corrections and additions to Mac OS case study. A bit more about binary files as caches. Added cite of Butler Lampson. Additions to history chapter. Note in futures chapter about C and exceptions. Many typo fixes.
Revision 0.4212 February 2003esr
Add fcntl/ioctl to things Unix got wrong.

To Ken Thompson and Dennis Ritchie, because you inspired me.

Table of Contents

Requests for reviewers and copy-editors
Preface
Who Should Read This Book
How To Use This Book
Related References
Conventions Used In This Book
Our Case Studies
Author's Acknowledgements
I. Context
1. Philosophy
Culture? What culture?
The durability of Unix
The case against learning Unix culture
What Unix gets wrong
What Unix gets right
Open-source software
Cross-platform portability and open standards
The Internet
The open-source community
Flexibility in depth
Unix is fun to hack
The lessons of Unix can be applied elsewhere
Basics of the Unix philosophy
Rule of Modularity: Write simple parts connected by clean interfaces.
Rule of Composition: Design programs to be connected with other programs.
Rule of Clarity: Clarity is better than cleverness.
Rule of Simplicity: Design for simplicity; add complexity only where you must.
Rule of Transparency: Design for visibility to make inspection and debugging easier.
Rule of Robustness: Robustness is the child of transparency and simplicity.
Rule of Least Surprise: In interface design, always do the least surprising thing.
Rule of Repair: Repair what you can — but when you must fail, fail noisily and as soon as possible.
Rule of Economy: Programmer time is expensive; conserve it in preference to machine time.
Rule of Generation: Avoid hand-hacking; write programs to write programs when you can.
Rule of Representation: Use smart data so program logic can be stupid and robust.
Rule of Separation: Separate policy from mechanism; separate interfaces from engines.
Rule of Optimization: Prototype before polishing. Get it working before you optimize it.
Rule of Diversity: Distrust all claims for one true way.
Rule of Extensibility: Design for the future, because it will be here sooner than you think.
The Unix philosophy in one lesson
Applying the Unix philosophy
Attitude matters too
2. History
Origins and history of Unix, 1969-1995
Genesis: 1969-1971
Exodus: 1971-1980
TCP/IP and the Unix Wars: 1980-1990
Blows against the empire: 1991-1995
Origins and history of the hackers, 1961-1995
At play in the groves of academe: 1961-1980
Internet fusion and the Free Software Movement: 1981-1991
Linux and the pragmatist reaction: 1991-1998
The open-source movement: 1998 and onward.
The lessons of Unix history
3. Contrasts
The elements of operating-system style
What is the unifying idea?
Cooperating processes
Internal boundaries
File attributes and record structures
Binary file formats
Preferred UI style
Who is the intended audience?
What are the entry barriers to development?
Operating-system comparisons
VMS
Mac OS
OS/2
Windows NT
BeOS
Linux
What goes around, comes around
II. Design
4. Modularity
Encapsulation and optimal module size
Compactness and orthogonality
Compactness
Orthogonality
The DRY rule
The value of detachment
Top-down, bottom-up, and glue layers
Case study: C considered as thin glue
Library layering
Case study: GIMP plugins
Unix and object-oriented languages
Coding for modularity
5. Textuality
The Importance of Being Textual
Case study: Unix password file format
Case study: .newsrc format
Case study: The PNG graphics file format
Data file metaformats
/etc/passwd style
RFC-822 format
Fortune-cookie format
XML
Windows INI format
Unix textual file format conventions
Application protocol design
Case study: SMTP, a simple socket protocol
Case study: POP3, the Post Office Protocol
Case study: IMAP, the Internet Message Access Protocol
Application protocol metaformats
The classical Internet application metaprotocol
HTTP as a universal application protocol
BEEP
XML-RPC. SOAP, and Jabber
Binary files as caches
6. Multiprogramming
Separating complexity control from performance tuning
Handing off tasks to specialist programs
Case study: the mutt mail user agent.
Pipes, redirection, and filters
Case study: Piping to a Pager
Case study: making word lists
Case study: pic2graph
Case study: bc(1) and dc(1)
Slave processes
Case study: scp(1) and ssh
Wrappers
Case study: backup scripts
Security wrappers and Bernstein chaining
Peer-to-peer inter-process communication
Signals
System daemons and conventional signals
Case study: fetchmail's use of signals
Temp files
Shared memory via mmap
Sockets
Obsolescent Unix IPC methods
Client-Server Partitioning for Complexity Control
Case study: PostgreSQL
Case study: Freeciv
Two traps to avoid
Remote procedure calls
Threads — threat or menace?
A fearful synergy
7. Transparency
Some case studies
Case study: audacity
Case study: fetchmail's -v option
Case study: kmail
Case study: sng
Case study: the terminfo database
Case study: Freeciv data files
Designing for transparency and discoverability
The Zen of transparency
Coding for transparency and discoverability.
Transparency and avoiding overprotectiveness.
Transparency and editable representations.
Transparency, fault diagnosis, and fault recovery
Designing for maintainability
8. Minilanguages
Taxonomy of languages
Applying minilanguages
Case study: sng
Case study: Glade
Case study: m4
Case study: XSLT
Case study: the DWB tools
Case study: fetchmailrc
Case study: awk
Case study: Postscript
Case study: bc and dc
Case study: Emacs Lisp
Case study: JavaScript
Designing minilanguages
Choosing the right complexity level
Extended and embedded languages
When you need a custom grammar
Macros — beware!
Language or application protocol?
9. Generation
Data-driven programming
Regular expressions
Case Study: ascii
Case Study: metaclass hacking in fetchmailconf
Ad-hoc code generation
Case study: generating code for a fixed screen display
Case study: generating HTML code for a tabular list
Special-purpose code generators
Yacc and Lex
Glade
Avoiding traps
10. Configuration
Run-control files
Case study: The .netrc file
Portability to other operating systems
Environment variables
Portability to other operating systems
Command-line options
The a to z of command-line options
Portability to other operating systems
How to choose among configuration-setting methods
Case study: fetchmail
Case study: the XFree86 server
On breaking these rules
11. Interfaces
Applying the Rule of Least Surprise
History of interface design on Unix
The right style for the right job
Tradeoffs between CLI and visual interfaces
Case study: Two ways to write a calculator program
Unix interface design patterns
The filter pattern
The cantrip pattern
The emitter pattern
The absorber pattern
The compiler pattern
The ed pattern
The rogue pattern
The ‘separated engine and interface’ pattern
The CLI server pattern
Language-based interface patterns
Applying Unix design patterns
The polyvalent-program pattern
The Web browser as universal front end
Silence is golden
III. Implementation
12. Languages
Unix's Cornucopia of Languages
Why Not C?
Interpreted Languages and Mixed Strategies
Language evaluations
C
C++
Shell
Perl
Tcl
Python
Java
Emacs Lisp
Trends for the Future
Choosing an X toolkit
13. Tools
A developer-friendly operating system
Choosing an editor
vi: lightweight but limited
Emacs: heavy metal editing
The benefits of knowing both
Is Emacs an argument against the Unix philosophy?
Make: automating your development recipes
Basic theory of make(1)
Make in non-C/C++ Development
Utility productions
Generating makefiles
Version-control systems
Why version control?
Version control by hand
Automated version control
Unix tools for version control
Run-time debugging
Profiling
Emacs as the universal front end
Emacs and make(1)
Emacs and run-time debugging
Emacs and version control
Emacs and Profiling
Like an IDE, only better...
14. Re-Use
The tale of J. Random Newbie
Transparency as the key to re-use
From re-use to open source
The best things in life are open
Where should I look?
What are the issues in using open-source software?
Licensing issues
What qualifies as open source
Standard open-source licenses
When you need a lawyer
Open-source software in the rest of this book
IV. Community
15. Portability
Evolution of C
Early history of C
C standards
Unix standards
Standards and the Unix wars
The ghost at the victory banquet
Unix standards in the open-source world
IETF and the RFC standards process
Specifications as DNA, code as RNA
Programming for Portability
Portability and choice of language
Avoiding system dependencies
Tools for portability
Portability, open standards and open source
16. Documentation
Documentation concepts
The Unix style
Technical background
Cultural style
The zoo of Unix documentation formats
troff and the DWB tools
TeX
Texinfo
POD
HTML
DocBook
The present chaos and a possible way out
DocBook
Document Type Definitions
Other DTDs
The DocBook toolchain
Migration tools
Editing tools
Related standards and practices
SGML
XML-Docbook References
How to write Unix documentation
17. Open Source
Unix and open source
Best practices for working with open-source developers
Good patching practice
Good project- and archive- naming practice
Good development practice
Good distribution-making practice
Good communication practice
The logic of licenses: how to pick one
Why you should use a standard license
Varieties of Open-Source Licensing
X Consortium License
BSD Classic License
Artistic License
General Public License
Mozilla Public License
18. Futures
Essence and accident in Unix tradition
Problems in the design of Unix
A Unix file is just a big bag of bytes
File deletion is forever
The Unix API doesn't use exceptions
ioctl(2) and fcntl(2) are an embarrassment
The Unix security model may be too primitive
Unix has too many different kinds of names for things
File systems might be considered harmful
Problems in the environment of Unix
Problems in the culture of Unix
Reasons to believe
A. Glossary of Abbreviations
B. References
C. Contributors

List of Figures

4.1. Qualitative plot of defect count and density vs. module size.
4.2. Caller/callee relationships in GIMP with a plugin loaded.
8.1. Taxonomy of languages.
8.2. Taxonomy of languages — the PIC source
11.1. Screen shot of the original Rogue game
11.2. Caller/callee relationships in a polyvalent program.
16.1. Processing structural documents
16.2. Present-day XML-DocBook toolchain
16.3. Future XML-DocBook toolchain with FOP
16.4. XML and SGML toolchains compared

List of Tables

9.1. Regular-expression examples
9.2. Introduction to regular-expression operations
12.1. Language choices on SourceForge, December 2002
12.2. Summary of X Toolkits

List of Examples

5.1. Password file example
5.2. A .newsrc example
5.3. A fortune file example
5.4. Three planets in an RFC822-like format
5.5. An XML example
5.6. A .INI file example
5.7. An SMTP session example
5.8. A POP3 example session
5.9. An IMAP session example
7.1. An example fetchmail -v transcript
8.1. Glade Hello, World
8.2. A sample m4 macro
8.3. Synthetic example of a fetchmailrc
9.1. Example of fetchmailrc syntax
9.2. Python structure dump of a fetchmail configuration
9.3. copy_instance metaclass code
9.4. Calling context for copy_instance
9.5. Desired output format for the star table
9.6. Master form of the star table
10.1. A .netrc example
10.2. X configuration example
16.1. troff(1) markup example
16.2. man markup example
17.1. Tar archive maker production