A Small GNU Autotools Project

This chapter introduces a small--but real--worked example, to illustrate some of the features, and highlight some of the pitfalls, of the GNU Autotools discussed so far. All of the source can be downloaded from the book's web page [1] . The text is peppered with my own pet ideas, accumulated over a several years of working with the GNU Autotools and you should be able to easily apply these to your own projects. I will begin by describing some of the choices and problems I encountered during the early stages of the development of this project. Then by way of illustration of the issues covered, move on to showing you a general infrastructure that I use as the basis for all of my own projects, followed by the specifics of the implementation of a portable command line shell library. This chapter then finishes with a sample shell application that uses that library.

Later, in the chapter called A Large GNU Autotools Project and the chapter called A Complex GNU Autotools Project, the example introduced here will be gradually expanded as new features of GNU Autotools are revealed.

GNU Autotools in Practice

This section details some of the specific problems I encountered when starting this project, and is representative of the sorts of things you are likely to want to do in projects of your own, but for which the correct solution may not be immediately evident. You can always refer back to this section for some inspiration if you come across similar situations. I will talk about some of the decisions I made about the structure of the project, and also the trade-offs for the other side of the argument - you might find the opposite choice to theone I make here is more relevant a particular project of yours.

Project Directory Structure

Before starting to write code for any project, you need to decide on the directory structure you will use to organise the code. I like to build each component of a project in its own subdirectory, and to keep the configuration sources separate from the source code. The great majority of GNU projects I have seen use a similar method, so adopting it yourself will likely make your project more familiar to your developers by association.

The top level directory is used for configuration files, such as configure and aclocal.m4, and for a few other sundry files, README and a copy of the project license for example.

Any significant libraries will have a subdirectory of their own, containing all of the sources and headers for that library along with a Makefile.am and anything else that is specific to just that library. Libraries that are part of a small like group, a set of pluggable application modules for example, are kept together in a single directory.

The sources and headers for the project's main application will be stored in yet another subdirectory, traditionally named src. There are other conventional directories your developers might expect too: A doc directory for project documentation; and a test directory for the project self test suite.

To keep the project top-level directory as uncluttered as possible, as I like to do, you can take advantage of Autoconf's AC_CONFIG_AUX_DIR by creating another durectory, say config, which will be used to store many of the GNU Autotools intermediate files, such as install-sh. I always store all project specific Autoconf M4 macros to this same subdirectory.

So, this is what you should start with:
     $ pwd
     ~/mypackage
     $ ls -F
     Makefile.am  config/     configure.in  lib/  test/
     README       configure*  doc/          src/
      

C Header Files

There is a small amount of boiler-plate that should be added to all header files, not least of which is a small amount of code to prevent the contents of the header from being scanned multiple times. This is achieved by enclosing the entire file in a preprocessor conditional which evaluates to false after the first time it has been seen by the preprocessor. Traditionally, the macro used is in all upper case, and named after the installation path without the installation prefix. Imagine a header that will be intalled to /usr/local/include/sys/foo.h, for example. The preprocessor code would be as follows:
     #ifndef SYS_FOO_H
     #define SYS_FOO_H 1
     ...
     #endif /* !SYS_FOO_H */
      

Apart from comments, the entire content of the rest of this header file must be between these few lines. It is worth mentioning that inside the enclosing ifndef, the macro SYS_FOO_H must be defined before any other files are #included. It is a common mistake to not define that macro until the end of the file, but mutual dependency cycles are only stalled if the guard macro is defined before the #include which starts that cycle [2] .

If a header is designed to be installed, it must #include other installed project headers from the local tree using angle-brackets. There are some implications to working like this:

  • You must be careful that the names of header file directories in the source tree match the names of the directories in the install tree. For example, when I plan to install the aforementioned foo.h to /usr/local/include/project/foo.h, from which it will be included using #include <project/foo.h>, then in order for the same include line to work in the source tree, I must name the source directory it is installed from project too, or other headers which use it will not be able to find it until after it has been installed.

  • When you come to developing the next version of a project laid out in this way, you must be careful about finding the correct header. Automake takes care of that for you by using -I options that force the compiler to look for uninstalled headers in the current source directory before searching the system directories for installed headers of the same name.

  • You don't have to install all of your headers to /usr/include - you can use subdirectories. And all without having to rewrite the headers at install time.

C++ Compilers

In order for a C++ program to use a library compiled with a C compiler, it is neccessary for any symbols exported from the C library to be declared between extern "C" { and }. This code is important, because a C++ compiler mangles [3] all variable and function names, where as a C compiler does not. On the other hand, a C compiler will not understand these lines, so you must be careful to make them invisible to the C compiler.

Sometimes you will see this method used, written out in long hand in every installed header file, like this:
     #ifdef __cplusplus
     extern "C" {
     #endif     
     ...
     
     #ifdef __cplusplus
     }
     #endif

But that is a lot of unnecessary typing if you have a few dozen headers in your project. Also the additional braces tend to confuse text editors, such as emacs, which do automatic source indentation based on brace characters.

Far better, then, to declare them as macros in a common header file, and use the macros in your headers:
     #ifdef __cplusplus
     #  define BEGIN_C_DECLS extern "C" {
     #  define END_C_DECLS   }     #else /* !__cplusplus */
     #  define BEGIN_C_DECLS
     #  define END_C_DECLS
     #endif /* __cplusplus */

I have seen several projects that name such macros with a leading underscore - _BEGIN_C_DECLS. Any symbol with a leading underscore is reserved for use by the compiler implementation, so you shouldn't name any symbols of your own in this way. By way of example, I recently ported the Small [4] language compiler to Unix, and almost all of the work was writing a Perl script to rename huge numbers of symbols in the compiler's reserved namespace to something more sensible so that GCC could even parse the sources. Small was originally developed on Windows, and the author had used a lot of symbols with a leading underscore. Although his symbol names didn't clash with his own compiler, in some cases they were the same as symbols used by GCC.

Function Definitions

As a stylistic convention, the return types for all function definitions should be on a separate line. The main reason for this is that it makes it very easy to find the functions in source file, by looking for a single identifier at the start of a line followed by an open parenthesis:
     $ egrep '^[_a-zA-Z][_a-zA-Z0-9]*[ \t]*\(' error.c
     set_program_name (const char *path)
     error (int exit_status, const char *mode, const char *message)
     sic_warning (const char *message)
     sic_error (const char *message)
     sic_fatal (const char *message)
      

There are emacs lisp functions and various code analysis tools, such as ansi2knr (see the section called K&R Compilers), which rely on this formatting convention, too. Even if you don't use those tools yourself, your fellow developers might like to, so it is a good convention to adopt.

Fallback Function Implementations

Due to the huge number of Unix varieties in common use today, many of the C library functions that you take for granted on your prefered development platform are very likely missing from some of the architectures you would like your code to compile on. Fundamentally there are two ways to cope with this:

  • Use only the few library calls that are available everywhere. In reality this is not actually possible because there are two lowest common denominators with mutually exclusive APIs, one rooted in BSD Unix (bcopy, rindex) and the other in SYSV Unix (memcpy, strrchr). The only way to deal with this is to define one API in terms of the other using the preprocessor. The newer POSIX standard deprecates many of the BSD originated calls (with exceptions such as the BSD socket API). Even on non-POSIX platforms, there has been so much cross pollination that often both varieties of a given call may be provided, however you would be wise to write your code using POSIX endorsed calls, and where they are missing, define them in terms of whatever the host platform provides. This approach requires a lot of knowledge about various system libraries and standards documents, and can leave you with reams of preprocessor code to handle the differences between APIS. You will also need to perform a lot of checking in configure.in to figure out which calls are available. For example, to allow the rest of your code to use the strcpy call with impunity, you would need the following code in configure.in:
              AC_CHECK_FUNCS(strcpy bcopy)
               
    And the following preprocessor code in a header file that is seen by every source file:
              #if !HAVE_STRCPY
              #  if HAVE_BCOPY
              #    define strcpy(dest, src)   bcopy (src, dest, 1 + strlen (src))
              #  else /* !HAVE_BCOPY */
                   error no strcpy or bcopy
              #  endif /* HAVE_BCOPY */
              #endif /* HAVE_STRCPY */
               

  • Alternatively you could provide your own fallback implementations of function calls you know are missing on some platforms. In practice you don't need to be as knowledgable about problematic functions when using this approach. You can look in GNU libiberty [5] or François Pinard's libit project [6] to see for which functions other GNU developers have needed to implement fallback code. The libit project is especially useful in this respect as it comprises canonical versions of fallback functions, and suitable Autoconf macros assembled from across the entire GNU project. I won't give an example of setting up your package to use this approach, since that is how I have chosen to structure the project described in this chapter.

Rather than writing code to the lowest common denominator of system libraries, I am a strong advocate of the latter school of thought in the majority of cases. As with all things it pays to take a pragmatic approach; don't be afraid of the middle ground - weigh the options on a case by case basis.

K&R Compilers

K&R C is the name now used to describe the original C language specified by Brian Kernighan and Dennis Ritchie (hence, `K&R'). I have yet to see a C compiler that doesn't support code written in the K&R style, yet it has fallen very much into disuse in favor of the newer ANSI C standard. Although it is increasingly common for vendors to unbundle their ANSI C compiler, the GCC project [7] is available for all of the architectures I have ever used.

There are four differences between the two C standards:

  1. ANSI C expects full type specification in function prototypes, such as you might supply in a library header file:
              extern int functionname (const char *parameter1, size_t parameter 2);
               
    The nearest equivalent in K&R style C is a forward declaration, which allows you to use a function before its corresponding definition:
              extern int functionname ();
               
    As you can imagine, K&R has very bad type safety, and does not perform any checks that only function arguments of the correct type are used.

  2. The function headers of each function definition are written differently. Where you might see the following written in ANSI C:
              int
              functionname (const char *parameter1, size_t parameter2)
              {
                ...
              }
               
    K&R expects the parameter type declarations separately, like this:
              int
              functionname (parameter1, parameter2)
                   const char *parameter1;
                   size_t parameter2;
              {
                ...
              }
               

  3. There is no concept of an untyped pointer in K&R C. Where you might be used to seeing void * pointers in ANSI code, you are forced to overload the meaning of char * for K&R compilers.

  4. Variadic functions are handled with a different API in K&R C, imported with #include <varargs.h>. A K&R variadic function definition looks like this:
              int
              functionname (va_alist)
                   va_dcl
              {
                va_list ap;
                char *arg;
              
                va_start (ap);
                ...
                arg = va_arg (ap, char *);
                ...
                va_end (ap);
              
                return arg ? strlen (arg) : 0;
              }
               
    ANSI C provides a similar API, imported with #include <stdarg.h>, though it cannot express a variadic function with no named arguments such as the one above. In practice, this isn't a problem since you always need at least one parameter, either to specify the total number of arguments somehow, or else to mark the end of the argument list. An ANSI variadic function definition looks like this:
              int
              functionname (char *format, ...)
              {
                va_list ap;
                char *arg;
              
                va_start (ap, format);
                ...
                arg = va_arg (ap, char *);
                ...
                va_end (ap);
              
                return format ? strlen (format) : 0;
              }
               

Except in very rare cases where you are writing a low level project (GCC for example), you probably don't need to worry about K&R compilers too much. However, supporting them can be very easy, and if you are so inclined, can be handled either by employing the ansi2knr program supplied with Automake, or by careful use of the preprocessor.

Using ansi2knr in your project is described in some detail in Automatic de-ANSI-fication: (Automake)Automatic de-ANSI-fication, but boils down to the following:

  • Add this macro to your configure.in file:
              AM_C_PROTOTYPES
               

  • Rewrite the contents of LIBOBJS and/or LTLIBOBJS in the following fashion:
              # This is necessary so that .o files in LIBOBJS are also built via
              # the ANSI2KNR-filtering rules.
              Xsed='sed -e "s/^X//"'
              LIBOBJS=`echo X"$LIBOBJS"|\
              	[$Xsed -e 's/\.[^.]* /.\$U& /g;s/\.[^.]*$/.\$U&/']`
               

Personally, I dislike this method, since every source file is filtered and rewritten with ANSI function prototypes and declarations converted to K&R style adding a fair overhead in additional files in your build tree, and in compilation time. This would be reasonable were the abstraction sufficient to allow you to forget about K&R entirely, but ansi2knr is a simple program, and does not address any of the other differences between compilers that I raised above, and it cannot handle macros in your function prototypes of definitions. If you decide to use ansi2knr in your project, you must make the decision before you write any code, and be aware of its limitations as you develop.

For my own projects, I prefer to use a set of preprocessor macros along with a few stylistic conventions so that all of the differences between K&R and ANSI compilers are actually addressed, and so that the unfortunate few who have no access to an ANSI compiler (and who cannot use GCC for some reason) needn't suffer the overheads of ansi2knr.

The four differences in style listed at the beginning of this subsection are addressed as follows:

  1. The function protoype argument lists are declared inside a PARAMS macro invocation so that K&R compilers will still be able to compile the source tree. PARAMS removes ANSI argument lists from function prototypes for K&R compilers. Some developers continue to use __P for this purpose, but strictly speaking, macros starting with _ (and especially __) are reserved for the compiler and the system headers, so using PARAMS, as follows, is safer:
              #if __STDC__
              #  ifndef NOPROTOS
              #    define PARAMS(args)      args
              #  endif
              #endif
              #ifndef PARAMS
              #  define PARAMS(args)        ()
              #endif
               
    This macro is then used for all function declarations like this:
              extern int functionname PARAMS((const char *parameter));
               

  2. With the PARAMS macro is used for all function declarations, ANSI compilers are given all the type information they require to do full compile time type checking. The function definitions proper must then be declared in K&R style so that K&R compilers don't choke on ANSI syntax. There is a small amount of overhead in writing code this way, however: The ANSI compile time type checking can only work in conjunction with K&R function definitions if it first sees an ANSI function prototype. This forces you to develop the good habit of prototyping every single function in your project. Even the static ones.

  3. The easiest way to work around the lack of void * pointers, is to define a new type that is conditionally set to void * for ANSI compilers, or char * for K&R compilers. You should add the following to a common header file:
              #if __STDC__
              typedef void *void_ptr;
              #else /* !__STDC__ */
              typedef char *void_ptr;
              #endif /* __STDC__ */
               

  4. The difference between the two variadic function APIs pose a stickier problem, and the solution is ugly. But it does work. FIrst you must check for the headers in configure.in:
              AC_CHECK_HEADERS(stdarg.h varargs.h, break)
               
    Having done this, add the following code to a common header file:
              #if HAVE_STDARG_H
              #  include <stdarg.h>
              #  define VA_START(a, f)        va_start(a, f)
              #else
              #  if HAVE_VARARGS_H
              #    include <varargs.h>
              #    define VA_START(a, f)      va_start(a)
              #  endif
              #endif
              #ifndef VA_START
                error no variadic api
              #endif
               
    You must now supply each variadic function with both a K&R and an ANSI definition, like this:
              int
              #if HAVE_STDARG_H
              functionname (const char *format, ...)
              #else
              functionname (format, va_alist)
                   const char *format;
                   va_dcl
              #endif
              {
                va_alist ap;
                char *arg;
              
                VA_START (ap, format);
                ...
                arg = va_arg (ap, char *);
                ...
                va_end (ap);
              
                return arg : strlen (arg) ? 0;
              }
               

Notes

[1]

@uref{FIXME://where.would.this.be}

[2]

An @code{#include} cycle is the situation where file @file{a.h} @code{#include}s file @file{b.h}, and @file{b.h} @code{#include}s file @file{a.h} -- either directly or through some longer chain of @code{#include}s.

[3]

For an explanation of name mangling @xref{Writing Portable C++, Writing Portable C++ with GNU Autotools}.

[4]

@uref{http://www.compuphase.com/small.htm}

[5]

Available at @uref{ftp://sourceware.cygnus.com/pub/binutils/}.

[6]

Distributed from @uref{http://www.iro.umontreal.ca/~pinard/libit}.

[7]

@sc{gcc} must be compilable by K compilers so that it can be built and installed in an @sc{ansi} compiler free environment.