Next: Id Program, Previous: Cut Program, Up: Clones [Contents][Index]
The egrep utility searches files for patterns.  It uses regular
expressions that are almost identical to those available in awk
(see section Regular Expressions).
You invoke it as follows:
egrep[options]'pattern'files …
The pattern is a regular expression.  In typical usage, the regular
expression is quoted to prevent the shell from expanding any of the
special characters as file name wildcards.  Normally, egrep
prints the lines that matched.  If multiple file names are provided on
the command line, each output line is preceded by the name of the file
and a colon.
The options to egrep are as follows:
-cPrint out a count of the lines that matched the pattern, instead of the lines themselves.
-sBe silent. No output is produced and the exit value indicates whether the pattern was matched.
-vInvert the sense of the test. egrep prints the lines that do
not match the pattern and exits successfully if the pattern is not
matched.
-iIgnore case distinctions in both the pattern and the input data.
-lOnly print (list) the names of the files that matched, not the lines that matched.
-e patternUse pattern as the regexp to match. The purpose of the -e option is to allow patterns that start with a ‘-’.
This version uses the getopt() library function
(see section Processing Command-Line Options)
and the file transition library program
(see section Noting Data file Boundaries).
The program begins with a descriptive comment and then a BEGIN rule
that processes the command-line arguments with getopt().  The -i
(ignore case) option is particularly easy with gawk; we just use the
IGNORECASE predefined variable
(see section Predefined Variables):
# egrep.awk --- simulate egrep in awk
#
# Options:
#    -c    count of lines
#    -s    silent - use exit value
#    -v    invert test, success if no match
#    -i    ignore case
#    -l    print filenames only
#    -e    argument is pattern
#
# Requires getopt and file transition library functions
BEGIN {
    while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) {
        if (c == "c")
            count_only++
        else if (c == "s")
            no_print++
        else if (c == "v")
            invert++
        else if (c == "i")
            IGNORECASE = 1
        else if (c == "l")
            filenames_only++
        else if (c == "e")
            pattern = Optarg
        else
            usage()
    }
Next comes the code that handles the egrep-specific behavior. If no
pattern is supplied with -e, the first nonoption on the
command line is used.  The awk command-line arguments up to ARGV[Optind]
are cleared, so that awk won’t try to process them as files.  If no
files are specified, the standard input is used, and if multiple files are
specified, we make sure to note this so that the file names can precede the
matched lines in the output:
    if (pattern == "")
        pattern = ARGV[Optind++]
    for (i = 1; i < Optind; i++)
        ARGV[i] = ""
    if (Optind >= ARGC) {
        ARGV[1] = "-"
        ARGC = 2
    } else if (ARGC - Optind > 1)
        do_filenames++
#    if (IGNORECASE)
#        pattern = tolower(pattern)
}
The last two lines are commented out, as they are not needed in
gawk.  They should be uncommented if you have to use another version
of awk.
The next set of lines should be uncommented if you are not using
gawk.  This rule translates all the characters in the input line
into lowercase if the -i option is specified.75
The rule is
commented out as it is not necessary with gawk:
#{
#    if (IGNORECASE)
#        $0 = tolower($0)
#}
The beginfile() function is called by the rule in ftrans.awk
when each new file is processed.  In this case, it is very simple; all it
does is initialize a variable fcount to zero. fcount tracks
how many lines in the current file matched the pattern.
Naming the parameter junk shows we know that beginfile()
is called with a parameter, but that we’re not interested in its value:
function beginfile(junk)
{
    fcount = 0
}
The endfile() function is called after each file has been processed.
It affects the output only when the user wants a count of the number of lines that
matched.  no_print is true only if the exit status is desired.
count_only is true if line counts are desired.  egrep
therefore only prints line counts if printing and counting are enabled.
The output format must be adjusted depending upon the number of files to
process.  Finally, fcount is added to total, so that we
know the total number of lines that matched the pattern:
function endfile(file)
{
    if (! no_print && count_only) {
        if (do_filenames)
            print file ":" fcount
        else
            print fcount
    }
total += fcount }
The BEGINFILE and ENDFILE special patterns
(see section The BEGINFILE and ENDFILE Special Patterns) could be used, but then the program would be
gawk-specific. Additionally, this example was written before
gawk acquired BEGINFILE and ENDFILE.
The following rule does most of the work of matching lines. The variable
matches is true if the line matched the pattern. If the user
wants lines that did not match, the sense of matches is inverted
using the ‘!’ operator. fcount is incremented with the value of
matches, which is either one or zero, depending upon a
successful or unsuccessful match.  If the line does not match, the
next statement just moves on to the next record.
A number of additional tests are made, but they are only done if we
are not counting lines.  First, if the user only wants the exit status
(no_print is true), then it is enough to know that one
line in this file matched, and we can skip on to the next file with
nextfile.  Similarly, if we are only printing file names, we can
print the file name, and then skip to the next file with nextfile.
Finally, each line is printed, with a leading file name and colon
if necessary:
{
    matches = ($0 ~ pattern)
    if (invert)
        matches = ! matches
    fcount += matches    # 1 or 0
    if (! matches)
        next
    if (! count_only) {
        if (no_print)
            nextfile
        if (filenames_only) {
            print FILENAME
            nextfile
        }
        if (do_filenames)
            print FILENAME ":" $0
        else
            print
    }
}
The END rule takes care of producing the correct exit status. If
there are no matches, the exit status is one; otherwise, it is zero:
END {
    exit (total == 0)
}
The usage() function prints a usage message in case of invalid options,
and then exits:
function usage()
{
    print("Usage: egrep [-csvil] [-e pat] [files ...]") > "/dev/stderr"
    print("\n\tegrep [-csvil] pat [files ...]") > "/dev/stderr"
    exit 1
}
It also introduces a subtle bug; if a match happens, we output the translated line, not the original.
Next: Id Program, Previous: Cut Program, Up: Clones [Contents][Index]