Go to the first, previous, next, last section, table of contents.
Using library functions in @command{awk} can be very beneficial. It encourages code reuse and the writing of general functions. Programs are smaller and therefore clearer. However, using library functions is only easy when writing @command{awk} programs; it is painful when running them, requiring multiple @option{-f} options. If @command{gawk} is unavailable, then so too is the @env{AWKPATH} environment variable and the ability to put @command{awk} functions into a library directory (see section Command-Line Options). It would be nice to be able to write programs in the following manner:
# library functions
@include getopt.awk
@include join.awk
...
# main program
BEGIN {
while ((c = getopt(ARGC, ARGV, "a:b:cde")) != -1)
...
...
}
The following program, `igawk.sh', provides this service. It simulates @command{gawk}'s searching of the @env{AWKPATH} variable and also allows nested includes; i.e., a file that is included with `@include' can contain further `@include' statements. @command{igawk} makes an effort to only include files once, so that nested includes don't accidentally include a library function twice.
@command{igawk} should behave just like @command{gawk} externally. This means it should accept all of @command{gawk}'s command-line arguments, including the ability to have multiple source files specified via @option{-f}, and the ability to mix command-line and library source files.
The program is written using the POSIX Shell (@command{sh}) command language. The way the program works is as follows:
The initial part of the program turns on shell tracing if the first
argument is `debug'. Otherwise, a shell trap statement
arranges to clean up any temporary files on program exit or upon an
interrupt.
The next part loops through all the command-line arguments. There are several cases of interest:
--
-W
-v, -F
-f, --file, --file=, -Wfile=
--source, --source=, -Wsource=
--version, -Wversion
If none of the @option{-f}, @option{--file}, @option{-Wfile}, @option{--source}, or @option{-Wsource} arguments are supplied, then the first non-option argument should be the @command{awk} program. If there are no command-line arguments left, @command{igawk} prints an error message and exits. Otherwise, the first argument is echoed into `/tmp/ig.s.$$'. In any case, after the arguments have been processed, `/tmp/ig.s.$$' contains the complete text of the original @command{awk} program.
The `$$' in @command{sh} represents the current process ID number. It is often used in shell programs to generate unique temporary file names. This allows multiple users to run @command{igawk} without worrying that the temporary file names will clash. The program is as follows:
#! /bin/sh
# igawk -- like gawk but do @include processing
if [ "$1" = debug ]
then
set -x
shift
else
# cleanup on exit, hangup, interrupt, quit, termination
trap 'rm -f /tmp/ig.[se].$$' 0 1 2 3 15
fi
while [ $# -ne 0 ] # loop over arguments
do
case $1 in
--) shift; break;;
-W) shift
set -- -W"$@"
continue;;
-[vF]) opts="$opts $1 '$2'"
shift;;
-[vF]*) opts="$opts '$1'" ;;
-f) echo @include "$2" >> /tmp/ig.s.$$
shift;;
-f*) f=`echo "$1" | sed 's/-f//'`
echo @include "$f" >> /tmp/ig.s.$$ ;;
-?file=*) # -Wfile or --file
f=`echo "$1" | sed 's/-.file=//'`
echo @include "$f" >> /tmp/ig.s.$$ ;;
-?file) # get arg, $2
echo @include "$2" >> /tmp/ig.s.$$
shift;;
-?source=*) # -Wsource or --source
t=`echo "$1" | sed 's/-.source=//'`
echo "$t" >> /tmp/ig.s.$$ ;;
-?source) # get arg, $2
echo "$2" >> /tmp/ig.s.$$
shift;;
-?version)
echo igawk: version 1.0 1>&2
gawk --version
exit 0 ;;
-[W-]*) opts="$opts '$1'" ;;
*) break;;
esac
shift
done
if [ ! -s /tmp/ig.s.$$ ]
then
if [ -z "$1" ]
then
echo igawk: no program! 1>&2
exit 1
else
echo "$1" > /tmp/ig.s.$$
shift
fi
fi
# at this point, /tmp/ig.s.$$ has the program
The @command{awk} program to process `@include' directives
reads through the program, one line at a time, using getline
(see section Explicit Input with getline). The input
file names and `@include' statements are managed using a stack.
As each `@include' is encountered, the current file name is
"pushed" onto the stack and the file named in the `@include'
directive becomes the current file name. As each file is finished,
the stack is "popped," and the previous input file becomes the current
input file again. The process is started by making the original file
the first one on the stack.
The pathto function does the work of finding the full path to
a file. It simulates @command{gawk}'s behavior when searching the
@env{AWKPATH} environment variable
(@pxref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}).
If a file name has a `/' in it, no path search is done. Otherwise,
the file name is concatenated with the name of each directory in
the path, and an attempt is made to open the generated file name.
The only way to test if a file can be read in @command{awk} is to go
ahead and try to read it with getline; this is what pathto
does.(63), the test
`getline junk < t' can loop forever if the file exists but is empty.
Caveat emptor.} If the file can be read, it is closed and the file name
is returned:
gawk -- '
# process @include directives
function pathto(file, i, t, junk)
{
if (index(file, "/") != 0)
return file
for (i = 1; i <= ndirs; i++) {
t = (pathlist[i] "/" file)
if ((getline junk < t) > 0) {
# found it
close(t)
return t
}
}
return ""
}
The main program is contained inside one BEGIN rule. The first thing it
does is set up the pathlist array that pathto uses. After
splitting the path on `:', null elements are replaced with ".",
which represents the current directory:
BEGIN {
path = ENVIRON["AWKPATH"]
ndirs = split(path, pathlist, ":")
for (i = 1; i <= ndirs; i++) {
if (pathlist[i] == "")
pathlist[i] = "."
}
The stack is initialized with ARGV[1], which will be `/tmp/ig.s.$$'.
The main loop comes next. Input lines are read in succession. Lines that
do not start with `@include' are printed verbatim.
If the line does start with `@include', the file name is in $2.
pathto is called to generate the full path. If it cannot, then we
print an error message and continue.
The next thing to check is if the file is included already. The
processed array is indexed by the full file name of each included
file and it tracks this information for us. If the file is
seen again, a warning message is printed. Otherwise, the new file name is
pushed onto the stack and processing continues.
Finally, when getline encounters the end of the input file, the file
is closed and the stack is popped. When stackptr is less than zero,
the program is done:
stackptr = 0
input[stackptr] = ARGV[1] # ARGV[1] is first file
for (; stackptr >= 0; stackptr--) {
while ((getline < input[stackptr]) > 0) {
if (tolower($1) != "@include") {
print
continue
}
fpath = pathto($2)
if (fpath == "") {
printf("igawk:%s:%d: cannot find %s\n",
input[stackptr], FNR, $2) > "/dev/stderr"
continue
}
if (! (fpath in processed)) {
processed[fpath] = input[stackptr]
input[++stackptr] = fpath # push onto stack
} else
print $2, "included in", input[stackptr],
"already included in",
processed[fpath] > "/dev/stderr"
}
close(input[stackptr])
}
}' /tmp/ig.s.$$ > /tmp/ig.e.$$
The last step is to call @command{gawk} with the expanded program, along with the original options and command-line arguments that the user supplied. @command{gawk}'s exit status is passed back on to @command{igawk}'s calling program:
eval gawk -f /tmp/ig.e.$$ $opts -- "$@" exit $?
This version of @command{igawk} represents my third attempt at this program. There are three key simplifications that make the program work better:
pathto function doesn't try to save the line read with
getline when testing for the file's accessibility. Trying to save
this line for use with the main program complicates things considerably.
getline loop in the BEGIN rule does it all in one
place. It is not necessary to call out to a separate loop for processing
nested `@include' statements.
Also, this program illustrates that it is often worthwhile to combine @command{sh} and @command{awk} programming together. You can usually accomplish quite a lot, without having to resort to low-level programming in C or C++, and it is frequently easier to do certain kinds of string and argument manipulation using the shell than it is in @command{awk}.
Finally, @command{igawk} shows that it is not always necessary to add new features to a program; they can often be layered on top. With @command{igawk}, there is no real reason to build `@include' processing into @command{gawk} itself.
As an additional example of this, consider the idea of having two files in a directory in the search path:
getopt and assert.
One user suggested that @command{gawk} be modified to automatically read these files upon startup. Instead, it would be very simple to modify @command{igawk} to do this. Since @command{igawk} can process nested `@include' directives, `default.awk' could simply contain `@include' statements for the desired library functions.
Go to the first, previous, next, last section, table of contents.