Hack 14 Study Channel Statistics with pisg
Most IRC clients will give you the option of
saving messages to a log file. Generate entertaining statistics from
these log files.
pisg is the Perl IRC Statistics
Generator. It's available from the web site
http://pisg.sourceforge.net and
is one of the most popular IRC statistics generators in use today.
This hack will show you how to use it to create amusing statistics
for your channels and display them to everybody on the Web.
3.4.1 Running pisg
The most important thing you need in order to run
pisg is a
log file. This log file should contain
timestamps so pisg can tell when each message
was sent. pisg supports several log file
formats, including those used by mIRC, XChat, Eggdrop,
irssi, infobot, and
PircBot. You will also need Perl in order to run
pisg.
3.4.1.1 Editing pisg.cfg
Editing pisg.cfg
should be your first step. Set up a
channel item that corresponds to the options you would like for your
channel. This lets you specify the name of the channel, the log file
to read from, the format of the log file, the maintainer of the log
file, and the name of the output file, for example:
<channel="#irchacks">
Logfile = "#irchacks.freenode.log"
Format = "mIRC"
Maintainer = "Bob"
OutputFile = "irchacksl"
</channel>
Once everything is set up, it's just a simple case
of executing the pisg script:
% ./pisg
pisg will then tear away at your log files and
churn out its statistics. In a matter of seconds to minutes
(depending on your computer's speed and the size of
the log), you will have a file called
irchacksl (or whatever else you called it)
containing all of the statistics.
3.4.2 Publishing pisg Statistics
Copy the output HTML file to
somewhere that can display web pages. Any old web server will do the
job, as it is just a static HTML page with no server-side content.
If you run your own web host, you could set the
OutputFile to be a full path in a directory
where the document would be visible on the Web. On a Unix/Linux box,
you could even set up a symlink to the file.
Wherever you decide to place the HTML file, you must also ensure that
the files from the gfx directory are in the same
place. These are used to create the colored bar charts in the
pisg output.
3.4.2.1 Setting up statistics options
pisg has
more configuration options than you
can shake a stick at. They are generally well documented. One common
option to change is to use ShowWords and
SortByWords instead of sorting by number of lines
(which is more vulnerable to users attempting to pad their stats).
3.4.2.2 Nickname tracking
pisg has
automatic
nickname tracking. When it is enabled,
this feature watches for people who change their IRC nickname and
will merge the statistics for two nicknames if it thinks it is
appropriate. Unfortunately, many channels have periods of silliness
in which people may temporarily play a game of
"musical nicks," or various people
may switch to the same nick temporarily. This can seriously mess up
the statistics. If this is an issue, you can use user
lines instead.
3.4.2.3 User lines
User lines are little lines in the
configuration file that contain information about a user. They
support several options:
<user nick="Fennec" alias="Fennec* Foo* Jacob* Jake|PDA" sex="m"
link="http://fennec.homedns.org">
The user's nick is the name of the user, as it will
appear on the stat page. The aliases are all other nicknames that
should be considered to be the same user. Wildcards are allowed with
the * character, but they have a tendency to slow
down statistics generation. The sex can be set to
m or f and will cause the name
to display as blue or pink and will also set several pronouns to use,
for example, "he" or
"she" instead of
"he/she."
Either nickname tracking or user lines is necessary for a meaningful
Users With Most Nicknames section.
Other useful options available for user lines include the
ignore="y" option, which can be added to ignore a
user. This is often applied to bots; however, some channels also
include their bots in statistics, and it can be particularly amusing
if the bots talk as much as some regular users.
3.4.2.4 Photos and photo galleries
If
you can cajole a
channel's user base into sending pictures of
themselves (or if you manage to track them down, stalk them, and take
pictures yourself), you can use pisg as a sort
of impromptu photo gallery. First, set your
ImagePath to where the images will be accessible.
Then you can add
pic="nickname.png" to each
user line.
With PicHeight and PicWidth,
you can set a default picture height and width for your page.
Dimensions of approximately 6648 pixels allow for a compact but
effective gallery.
Setting a user's bigPic option
will cause the user's picture to link to the
specified file. Including a wildcard as a user's
picture will cause one of the pictures that match the wildcard for
that user to be randomly selected. Setting the
UserPics option will allow more than one picture
per row. The DefaultPic option will allow you to
set up a default user picture.
3.4.2.5 Headers and footers
A custom header (or footer) with
some spiffy and topical images or a quote is a nice way of adding a
personalized touch to your statistics. This should be in HTML
(ideally, XHTML). For example, here are the contents of a generalized
header file:
<table border="1"><tr><td>
<table border="0">
<tr><td><img src="/image/library/english/10059_image.png" /></td>
<td align="center"><div align="center">
Spiffy amusing headline here!
<hr />
<font size="-4"><span style="color: #AAAAAA; font-size: 9px;">
Informational byline here.
</span></font>
</div></td>
<td><img src="/image/library/english/10059_picture.png" /></td>
</tr>
</table>
</td></tr></table>
Change /image/library/english/10059_image.png,
/image/library/english/10059_picture.png, the captions, and the
headline/byline as you see fit. This header works well with images
approximately 48 pixels high.
3.4.3 The Results
If you've set
everything up correctly, you'll end up with
something like Figure 3-5, with a colorful bar
chart showing which times of the day are most active, along with
pictures of each user. This bar chart is interesting in that it shows
activity starting at 8 a.m. and steadily growing before falling back
down at lunchtime. Even IRC users have to stop for lunch.
Figure 3-5. Output from pisg, showing activity periods and user info

pisg
also generates several other pieces of information that are not
readily obvious, such as the Big Numbers
section, shown in Figure 3-6. This shows who asked
the most questions, who shouted the most, who was most aggressive,
who was most disliked, and who was the happiest.
Figure 3-6. Some of the other statistics obtained from pisg

The pisg web site (http://pisg.sourceforge.net) contains links
to hundreds of real examples of pisg in
action.
Thomas Whaples