1.24. Program: psgrep
Many programs, including
ps, netstat,
lsof, ls -l, find
-ls, and tcpdump, can produce more
output than can be conveniently summarized. Logfiles also often grow
too long to be easily viewed. You could send these through a filter
like grep to pick out only certain lines, but
regular expressions and complex logic don't mix well; just look at
the hoops we jump through in Recipe 6.18.What we'd really like is to make full queries on the program output
or logfile. For example, to ask ps something
like, "Show me all processes that exceed 10K in size but which aren't
running as the superuser" or "Which commands are running on
pseudo-ttys?"The psgrep program does this—and
infinitely more—because the specified selection criteria are
not mere regular expressions; they're full Perl code. Each criterion
is applied in turn to every line of output. Only lines matching all
arguments are output. The following is a list of things to find and
how to find them.Lines containing "sh" at the end of a word:
% psgrep '/sh\b/'
Processes whose command names end in "sh":
% psgrep 'command =~ /sh$/'
Processes running with a user ID below 10:
% psgrep 'uid < 10'
Login shells with active ttys:
% psgrep 'command =~ /^-/' 'tty ne "?"'
Processes running on pseudo-ttys:
% psgrep 'tty =~ /^[p-t]/'
Non-superuser processes running detached:
% psgrep 'uid && tty eq "?"'
Huge processes that aren't owned by the superuser:
% psgrep 'size > 10 * 2**10' 'uid != 0'
The last call to psgrep produced the following
output when run on our system. As one might expect, only
netscape and its spawn qualified.
FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
0 101 9751 1 0 0 14932 9652 do_select S p1 0:25 netscape
100000 101 9752 9751 0 0 10636 812 do_select S p1 0:00 (dns helper)
Example 1-6 shows the psgrep
program.
Example 1-6. psgrep
#!/usr/bin/perl -w
# psgrep - print selected lines of ps output by
# compiling user queries into code
use strict;
# each field from the PS header
my @fieldnames = qw(FLAGS UID PID PPID PRI NICE SIZE
RSS WCHAN STAT TTY TIME COMMAND);
# determine the unpack format needed (hard-coded for Linux ps)
my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
my %fields; # where the data will store
die << Thanatos unless @ARGV;
usage: $0 criterion ...
Each criterion is a Perl expression involving:
@fieldnames
All criteria must be met for a line to be printed.
Thanatos
# Create function aliases for uid, size, UID, SIZE, etc.
# Empty parens on
closure args needed for void prototyping.
for my $name (@fieldnames) {
no strict 'refs';
*$name = *{lc $name} = sub ( ) { $fields{$name} };
}
my $code = "sub is_desirable { " . join(" and ", @ARGV) . " } ";
unless (eval $code.1) {
die "Error in code: $@\n\t$code\n";
}
open(PS, "ps wwaxl |") || die "cannot fork: $!";
print scalar <PS>; # emit header line
while (<PS>) {
@fields{@fieldnames} = trim(unpack($fmt, $_));
print if is_desirable( ); # line matches their criteria
}
close(PS) || die "ps failed!";
# convert cut positions to unpack format
sub cut2fmt {
my(@positions) = @_;
my $template = ';
my $lastpos = 1;
for my $place (@positions) {
$template .= "A" . ($place - $lastpos) . " ";
$lastpos = $place;
}
$template .= "A*";
return $template;
}
sub trim {
my @strings = @_;
for (@strings) {
s/^\s+//;
s/\s+$//;
}
return wantarray ? @strings : $strings[0];
}
# the following was used to determine column cut points.
# sample input data follows
#123456789012345678901234567890123456789012345678901234567890123456789012345
# 1 2 3 4 5 6 7
# Positioning:
# 8 14 20 26 30 34 41 47 59 63 67 72
# | | | | | | | | | | | |
_ _END_ _
FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
100 0 1 0 0 0 760 432 do_select S ? 0:02 init
140 0 187 1 0 0 784 452 do_select S ? 0:02 syslogd
100100 101 428 1 0 0 1436 944 do_exit S 1 0:00 /bin/login
100140 99 30217 402 0 0 1552 1008 posix_lock_ S ? 0:00 httpd
0 101 593 428 0 0 1780 1260 copy_thread S 1 0:00 -tcsh
100000 101 30639 9562 17 0 924 496 R p1 0:00 ps axl
0 101 25145 9563 0 0 2964 2360 idetape_rea S p2 0:06 trn
100100 0 10116 9564 0 0 1412 928 setup_frame T p3 0:00 ssh -C www
100100 0 26560 26554 0 0 1076 572 setup_frame T p2 0:00 less
100000 101 19058 9562 0 0 1396 900 setup_frame T p1 0:02 nvi /tmp/a
The psgrep program integrates many techniques
presented throughout this book. Stripping strings of leading and
trailing whitespace is found in Recipe 1.19.
Converting cut marks into an unpack format to
extract fixed fields is in Recipe 1.1.
Matching strings with regular expressions is the entire topic of
Chapter 6.The multiline string in the here document passed to
die is discussed in Recipe 1.15 and Recipe 1.16. The
assignment to @fields{@fieldnames} sets many
values at once in the hash named %fields. Hash
slices are discussed in Recipe 4.8 and
Recipe 5.11.The sample program input contained beneath _ _END_
_ is described in Recipe 7.12.
During development, we used canned input from the
DATA filehandle for testing purposes. Once the
program worked properly, we changed it to read from a piped-in
ps command but left a remnant of the original
filter input to aid in future porting and maintenance. Launching
other programs over a pipe is covered in Chapter 16, including Recipe 16.10 and Recipe 16.13.The real power and expressiveness in psgrep
derive from Perl's use of string arguments not as mere strings but
directly as Perl code. This is similar to the technique in Recipe 9.9, except that in
psgrep, the user's arguments are wrapped with a
routine called is_desirable. That way, the cost of
compiling strings into Perl code happens only once, before the
program whose output we'll process is even begun. For example, asking
for UIDs under 10 creates this string to eval:
eval "sub is_desirable { uid < 10 } " . 1;
The mysterious ".1" at the end is so that if the
user code compiles, the whole eval returns true.
That way we don't even have to check $@ for
compilation errors as we do in Recipe 10.12.Specifying arbitrary Perl code in a filter to select records is a
breathtakingly powerful approach, but it's not entirely original.
Perl owes much to the awk programming language,
which is often used for such filtering. One problem with
awk is that it can't easily treat input as
fixed-size fields instead of fields separated by something. Another
is that the fields are not mnemonically named:
awk uses $1,
$2, etc. Plus, Perl can do much that
awk cannot.The user criteria don't even have to be simple expressions. For
example, this call initializes a variable $id to
user nobody's number to use later in its
expression:
% psgrep 'no strict "vars";
BEGIN { $id = getpwnam("nobody") }
uid = = $id '
How can we use
unquoted words without even a dollar sign, like
uid, command, and
size, to represent those respective fields in each
input record? We directly manipulate the symbol table by assigning
closures to indirect typeglobs, which creates functions with those
names. The function names are created using both uppercase and
lowercase names, allowing both "UID < 10" and
"uid > 10". Closures are described in Recipe 11.4, and assigning them to typeglobs to
create function aliases is shown in Recipe 10.14.One twist here not seen in those recipes is empty parentheses on the
closure. These allowed us to use the function in an expression
anywhere we'd use a single term, like a string or a numeric constant.
It creates a void prototype so the field-accessing function named
uid accepts no arguments, just like the built-in
function time. If these functions weren't
prototyped void, expressions like "uid < 10" or
"size/2 >
rss" would confuse the parser because it would see
the unterminated start of a wildcard glob and of a pattern match,
respectively. Prototypes are discussed in Recipe 10.11.The version of psgrep demonstrated here expects
the output from Red Hat Linux's ps. To port to
other systems, look at which columns the headers begin at. This
approach isn't relevant only to ps or only to
Unix systems; it's a generic technique for filtering input records
using Perl expressions, easily adapted to other record layouts. The
input format could be in columns, space separated, comma separated,
or the result of a pattern match with capturing parentheses.The program could even be modified to handle a user-defined database
with a small change to the selection functions. If you had an array
of records as described in Recipe 11.9, you
could let users specify arbitrary selection criteria, such as:
sub id( ) { $_->{ID} }
sub title( ) { $_->{TITLE} }
sub executive( ) { title =~ /(?:vice-)?president/i }
# user search criteria go in the grep clause
@slowburners = grep { id<10 && !executive } @employees;
For reasons of security and performance, this kind of power is seldom
found in database engines like those described in Chapter 14. SQL doesn't support this, but given Perl and
small bit of ingenuity, it's easy to roll it up on your
own.