IRC Hacks [Electronic resources] نسخه متنی

Hack 41 Log URLs People Mention

Logging URLs on IRC is useful in case you need
to refer to them later on. Learn an unusual and interesting way to do
it with a shell script on Linux/Unix.

Often, useful URLs are mentioned on a
channel, and you cannot visit them straightaway but would like to
check them out later. Perhaps you remember someone mentioning the URL
of a really cool page containing various useful IRC hacks, but you
just cannot remember it. Or maybe you just hate the constant cutting
and pasting of URLs that your friends keep posting.

In this hack, you will look at a simple IRC client that will be
absolutely passiveit will just sit in your channel, silently
noting down the URLs passing by. Because such a task would be too
simple in a language like Perl, let's show at the
same time that you can make useful IRC hacks in a pure shell script!

6.3.1 The Code

The trivial solution would be to have an input
block, emitting just a few commands required to negotiate the
connection and join the channel. This block would be piped to
netcat, with
netcat's output then redirected
to another block, munching the server's lines and
selecting the PRIVMSG messages that contain a URL.

But the world is never this simple. This architecture has a fatal
problem that means you cannot send any commands to the server later.
The basic flaw is the inability to reply to PINGs from the server,
which means that the server will decide the connection is dead and
close it unexpectedly. You could, of course, cheat it by showing some
periodic activity. However, what if you want to make some more
elaborate interface available, like being able to join more channels
on request? Or handle various errors properly?

You should ideally remove this limitation and somehow connect the
input and output blocks. But how can one do that? Try thinking about
it and see if your ideas work as you expected.

The bash
shell can do wonderful things with
redirections. You can, for example, redirect to or from a special
file that triggers some magic inside of bash. One of these files is
/dev/tcp/hostname/port, which establishes a TCP
connection. You can say, "Whatever, I have my
netcat and love it!", but first
realize that this way the socket behaves like a file for redirection
purposes and that considerably expands our possibilities.

What about the other redirection trick? You need to direct both your
input and output to the socket. The answer is using the
<> redirection operator,
which will open the given (magic) file for both input and output. But
it acts upon stdin, so you also need to redirect
stdout to stdin with
>&0.

In case something goes wrong and you cannot connect, or if the server
dies, you should try again. That is easythe read in the main
input while loop fails and it bails out, therefore
you must add yet another while loop around the
whole block. You should not forget to sleep for some reasonable time
between the iterations in case the connection failure is repetitive.

Save the following as urlgrab.sh:

#!/bin/sh
# IRC URL grabber: records all URLs mentioned to a log file.
# This script is public domain.
# Configuration section.
SERVER="irc.freenode.net"
PORT="6667"
NICK="urlspy"
IDENT="urlspy"
IRCNAME="URL Grabber"
CHANNEL="#irchacks" # We can specify multiple channels separated by a comma.
LOGFILE="url.log"
# Try to reconnect in case the connection fails.
while true; do
# Standard input/output of this block is redirected to an IRC connection.
{
# We prepare few raw IRC commands and send them out in advance.  We do not do
# any error checking, therefore if one of the commands fails, the game is over.
echo "USER $IDENT x x :$IRCNAME"
echo "NICK $NICK"
echo "JOIN $CHANNEL"
while read input; do
# Strip the CRLF at the end of each line.
input=`echo "$input" | tr -d '\r\n'`
# If this is a PING, then send a PONG back.
ping=`echo "$input" | cut -d " " -f 1`
if [ "$ping" = "PING" ]; then
data=`echo "$input" | cut -d " " -f 2-`
echo "PONG $data"
continue
fi
# One PRIVMSG line looks like:
# :pasky!pasky@pasky.or.cz PRIVMSG #elinks :(IRC hack ;)
#  --------source--------- --cmd-- -dest--  ---text-----
cmd=`echo "$input" | cut -d " " -f 2`
if [ "$cmd" != "PRIVMSG" ]; then
continue
fi
# Extract the other fields from the message.
# We must not forget to strip the leading colons from $source and $text.
source=`echo "$input" | cut -d " " -f 1`
source=`echo "$source" | sed 's/^://'`
target=`echo "$input" | cut -d " " -f 3`
text=`echo "$input" | cut -d " " -f 4-`
text=`echo "$text" | sed 's/^://'`
# Our URL-matching regular expression is of course far from perfect.
# Some more complex ones can be found 
# (e.g., at http://www.regexp.org/486).
# Sed won't print the lines out on its own because of -n and the 'p'
# command will utter the line only if the preceding address (a regexp
# in our case) is found.  This hack requires GNU sed.
# Note that the continuation lines of the sed expression MUST start at
# the beginning of the lines!
url=`echo "$text" | sed -n 's/^.*\(\(http\|ftp\)
s\{0,1\}:\/\/''[\-\.\,\/\%~\=\@\_\&\:\?\#a-zA-Z0-9]*''[\/\=\#a
-zA-Z0-9]\).*$/\1/gp'`
if [ "$url" ]; then
# One line in the log shall look like:
# ----date---- :: ---source--- -> ---dest--- :: ---url---
echo `date` ":: $source -> $target :: $url"
 >>$LOGFILE
fi
done
} <>/dev/tcp/$SERVER/$PORT >&0
sleep 30
done

6.3.2 Running the Hack

First, you need to change the settings
at the start of the file and tweak the configuration to suit your
needs. Then just execute the script and watch your log file slowly
grow.

To make the script executable, you can use the
chmod command:

% chmod u+x urlgrab.sh

Then you can run the script from the command line:

% ./urlgrab.sh

Whenever a URL is detected within a message, it will append a line
like this to the log file:

Mar 20 19:39:23 2004 :: pasky!pasky@pasky.or.cz -> #ch :: http://hacks.oreilly.com/

You can use this log file in any way you want, whether
it's for your own personal use or to display the
most popular links on a web page.

6.3.3 Hacking the Hack

The
basic
flaw here is obviously the lack of portability of our redirection
tricks. This one is, however, easily fixed. There are alternative
solutions, perhaps less elegant, but still very usable.

The simplest approach would involve having an
"input" file that
you
tail -f to
netcat. Then you can turn the rest of the script
into an output block, where you just append all the commands to the
input file, for example:

# ... configuration ...
# If you don't have mktemp installed (http://www.mktemp.org/mktemp/) you can
# use `/tmp/urlgrab.$$` instead, at the risk of a security problem.
TMPFILE=`mktemp`
tail -f $TMPFILE | nc $SERVER $PORT | {
# ... the original block's body ...
} >>$TMPFILE
rm $TMPFILE

Alternatively, you could use
mkfifo to pass the data through a named pipe,
which is less portable, but takes virtually no disk space and might
be more effective. In that case, you could use a simple
cat instead of tail -f.

Of course, there are a lot of other possible enhancements. You should
ideally handle any errors correctly. That means some code inflation,
as you can't just dump the startup commands blindly,
but you would have to wait for some numerics from the server to
indicate that you have succeeded in connecting before you send the
JOIN command.

Another issue is cycling between multiple IRC servers, which is a
must for a reliable IRC bot. It is easy to do using
cut and a cycling counter.

Maybe you would like to log the whole line containing the URL instead
of just the URL itself? This is useful if you are interested in the
context or if there is a description placed near the URL. All you
will need to do is modify the sed script from
s/regexp/\1/gp to /regexp/p.

A significantly more challenging problem is handling the possibility
of multiple URLs in the same message. When you use just a simple
search instead of the substitution, as outlined earlier, this is not
an issue, but otherwise you would need to weed out the non-URL parts.
Even though this problem would probably be solvable in
sed, at this level of complexity it is wiser to
switch to something more convenient, such as
Perl. This would
result in replacing the sed statement with
something like this:

perl -nle 'print join (" ",
m$((?:http|ftp)s?://[-\.,/%~=\@_\&:\?#a-zA-Z0-9]*[\/=#a-zA-Z0-9])$g);'

The -e flag will make Perl execute the given
statement, -n will run the statement for each line
of input, and -l will make it add a newline at the
end of each line.
m$regexp$g will match the input as many times as
necessary and output a list of matched URLs, which is then joined by
spaces and printed out.

This hack doesn't even consider all the
possibilities regarding various uses of the logged URLs, from
automatic opening in a web browser to storing them in an SQL database
[Hack #42] . This part is
definitely up to your imagination.

Petr Baudis

IRC Hacks [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی