| Autoconf, Automake, and Libtool | ||
|---|---|---|
| <<< Previous | Writing Portable C++ with GNU Autotools | Next >>> |
C++ compilers are complex pieces of software. Sadly, sometimes the details of a compiler's implementations leak out and bother the application programmer. The two aspects of C++ compiler implementation that have caused grief in the past are efficient template instantiation and name mangling. Both of these aspects will be explained.
The problem with template instantiation exists because of a number of complex constraints:
The compiler should only generate an instance of a template once, to speed the compilation process.
The linker needs to be smart about where to locate the object code for instantiations produced by the compiler.
This problem is exacerbated by separate compilation--that is, the method bodies for List<T> may be located in a header file or in a seperate compilation unit. These files may even be in a different directory than the current directory!
Life is easy for the compiler when the template definition appears in the same compilation unit as the site of the instantiation--everything that is needed is known:
template <class T> class List
{
private:
T* head;
T* current;
};
List<int> li; |
This becomes significantly more difficult when the site of a template instantiation and the template definition is split between two different compilation units. In [Linkers and Loaders], Levine describes in detail how the compiler driver deals with this by iteratively attempting to link a final executable and noting, from undefined symbol errors produced by the linker, which template instantiations must be performed to successfully link the program.
In large projects where templates may be instantiated in multiple locations, the compiler may generate instantiations multiple times for the same type. Not only does this slow down compilation, but it can result in some difficult problems for linkers which refuse to link object files containing duplicate symbols. Suppose there is the following directory layout:
src
|
`--- core
| `--- core.cxx
`--- modules
| `--- http.cxx
`--- lib
`--- stack.h |
If the compiler generates core.o in the core directory and libhttp.a in the http directory, the final link may fail because libhttp.a and the final executable may contain duplicate symbols--those symbols generated as a result of both http.cxx and core.cxx instantiating, say, a Stack<int>. Linkers, such as that provided with AIX will allow duplicate symbols during a link, but many will not.
Some compilers have solved this problem by maintaining a template repository of template instantiations. Usually, the entire template definition is expanded with the specified type parameters and compiled into the repository, leaving the linker to collect the required object files at link time.
The main concerns about non-portability with repositories center around getting your compiler to do the right thing about maintaining a single repository across your entire project. This often requires a vendor-specific command line option to the compiler, which can detract from portability. It is conceivable that Libtool could come to the rescue here in the future.
Early C++ compilers mangled the names of C++ symbols so that existing linkers could be used without modification. The cfront C++ translator also mangled names so that information from the original C++ program would not be lost in the translation to C. Today, name mangling remains important for enabling overloaded function names and link-time type checking. Here is an example C++ source file which illustrates name mangling in action:
class Foo
{
public:
Foo ();
void go ();
void go (int where);
private:
int pos;
};
Foo::Foo ()
{
pos = 0;
}
void
Foo::go ()
{
go (0);
}
void
Foo::go (int where)
{
pos = where;
}
int
main ()
{
Foo f;
f.go (10);
}
$ g++ -Wall example.cxx -o example.o
$ nm --defined-only example.o
00000000 T __3Foo
00000000 ? __FRAME_BEGIN__
00000000 t gcc2_compiled.
0000000c T go__3Foo
0000002c T go__3Fooi
00000038 T main |
Even though Foo contains two methods with the same name, their argument lists (one taking an int, one taking no arguments) help to differentiate them once their names are mangled. The go__3Fooi is the version which takes an int argument. The __3Foo symbol is the constructor for Foo. The GNU binutils package includes a utility called c++filt that can demangle names. Other proprietary tools sometimes include a similar utility, although with a bit of imagination, you can often demangle names in your head.
$ nm --defined-only example.o | c++filt
00000000 T Foo::Foo(void)
00000000 ? __FRAME_BEGIN__
00000000 t gcc2_compiled.
0000000c T Foo::go(void)
0000002c T Foo::go(int)
00000038 T main |
Name mangling algorithms differ between C++ implementations so that object files assembled by one tool chain may not be linked by another if there are legitimate reasons to prohibit linking. This is a deliberate move, as other aspects of the object file may make them incompatible--such as the calling convention used for making function calls.
This implies that C++ libraries and packages cannot be practically distributed in binary form. Of course, you were intending to distribute the source code to your package anyway, weren't you?
| <<< Previous | Home | Next >>> |
| Changeable C++ | Up | How GNU Autotools Can Help |