IRC Hacks [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

IRC Hacks [Electronic resources] - نسخه متنی

Paul Mutton

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Hack 48 Getting Friendly with FOAFBot

Come to grips with the Semantic Web by using
FOAFBot to find out information about your friends and strangers
alike.

The
Semantic Web is the next generation of the
Web. Instead of being made up of just web pages, the Semantic Web
uses languages that store information in a way that computers can
understand it. Using standard languages like RDF (http://www.w3.org/TR/rdf-syntax-grammar),
RDFS, and OWL (http://www.w3.org/TR/owl-features), users can
create files called
ontologies
that have classes of things and the properties that apply to them.
People can then make instances of classes that anyone has defined on
the Semantic Web.

FOAF, the Friend-Of-A-Friend ontology
(http://www.foaf-project.org),
is one of the more popular ontologies and
Semantic Web applications. The ontology defines a class called
Person, and the related properties, such as name, email address, web
page address, photographic depictions, and, most importantly, whom
the person knows. When people create FOAF data about themselves and
their friends, they can point to the FOAF files of their friends.
Those files will, in turn, give information about the friends and
point to the FOAF files of people the friend knows. This branches out
to form a large social network. Some common properties of people in
FOAF are listed in Table 7-1.

Table 7-1. FOAF properties

accountName
accountServiceHomepage
aimChatID
based_near
currentProject
depiction
dnaChecksum
family_name
firstName
fundedBy
geekcode
gender
givenname
holdsAccount
homepage
icqChatID
img
interest
jabberID
knows


mbox 
mbox_sha1sum
msnChatID
myersBriggs
name
nick
page
pastProject
phone
plan
Publications
schoolHomepage
surname
title
topic
topic_interest
weblog
workInfoHomepage
workplaceHomepage
yahooChatID

Many web applications show FOAF data and the resulting networks.
Foafnaut (http://www.foafnaut.org) is an SVG-based
visualization of the FOAF networks. Foaf-a-matic is a web-based form
that automatically creates a FOAF file without requiring the user to
learn the Semantic Web languages. Several other applications are
linked from the FOAF Project web site (http://foaf-project.org).

Edd Dumbill created the first
FOAFBot, which could
be queried for personal information about any person in the network,
including who they know. More information, including his original
Python source code, is available at http://usefulinc.com/foaf/foafbot. This hack
will present the steps required to create your own FOAF-aware IRC
agent.


7.6.1 Parsing a FOAF File


FOAF files are written in

OWL, the Web Ontology Language. Writing a
good OWL parser would take a long time, but luckily, many are
available for free on the Web. One of the most popular is Jena,
developed at HP Labs (http://www.hpl.hp.com/semweb/jena). It is
a Java-based parser, available in a single JAR file. The online
documentation is excellent, and the API is relatively intuitive. In
this section, you will be taken through the steps of loading a FOAF
file with Jena, retrieving the relevant information, and storing it
in a data structure.

Before you start, there are some Semantic Web basics that are worth
knowing. Everything on the Semantic Webfiles, classes,
properties, and instancesare all identified by their
URIs. A URI (Uniform Resource
Indicator) is a web address for the concept. URIs generally take the
form of the web address of the file, followed by a
"#" and the ID of a concept. For
example, the URI of a FOAF file may be http://example.com/myFoaf.rdf. If, within
that file, you defined an instance of the Person class with the ID
"BobSmith," the URI for Bob would
be http://example.com/myFoaf.rdf#BobSmith.

A statement on the Semantic
Web takes a form called a triple. As you might expect, a triple
has three parts: subject, predicate, and object. The subject is the
thing being described. The predicate is the property of the subject
that is being described, and the object is the value of the property.
For example, say there was a property
"age." Table 7-2
shows an example of a triple representing Bob Smith, age 21.

Table 7-2. Example of a triple

Subject


Predicate


Object


BobSmith


age


21

Since everything on the Semantic Web is identified by a URI, every
property, class, and instance in the triple is actually identified by
its URI. The full triple for the example in the table would be:

Subject:   http://example.com/myFoaf.rdf#BobSmith   
Predicate: http://example.com/another.rdf#age
Object: 21

Here, the object "21" is just a
literal value, so it does not get a URI. If you wanted to connect two
objectssay Bob Smith and Joe
Schmoein a triple, there would be three URIs:

Subject:   http://example.com/myFoaf.rdf#BobSmith   
Predicate: http://example.com/another.rdf#knows
Object: http://example.com/myFoaf.rdf#JoeSchmoe

A general familiarity with this triple and URI structure will make
the Jena output easier to understand and work
with.

To begin coding, you will need a class to store all of the FOAF
information about a person. The class should have all the properties
available in FOAF. The value for each property value will be a
string; however, a person can have multiple values for any of these
fields (e.g., a person can have multiple email addresses). Thus, the
class will maintain a Vector of Strings to store the values for each
property:

import java.util.*;
import com.hp.hpl.jena.rdf.model.*;
public class Person {
// Store the info in a hash of Vectors.
public Hashtable foafData = new Hashtable( );
public Person( ) {
// For now, we will leave this blank...
}
}

With the class in hand, you need to parse the FOAF file and add the
correct values to an instance of the Person class. To parse a file in
Jena, you first create a model and then read the FOAF file into the
model. The FOAF filename should be given by its address on the Web:

import java.util.*;
import java.awt.*;
import com.hp.hpl.jena.rdf.model.*;
import java.io.*;
public class Foaf {
private static Hashtable foafHash = new Hashtable( );
private static String inputFile = "http://www.cs.umd.edu/~golbeck/foaf.rdf";
public static void main (String argv[]) {
Model model = ModelFactory.createDefaultModel( );
model.read(inputFile);
}
}

Once the model has parsed the file, you have to retrieve the
triples. The Jena web docs are useful in
this respect. To make the process easier, the code for iterating
through the statements is:

// Get a list of the subjects.
ResIterator it = model.listSubjects( );
while (it.hasNext( )) {
Resource subject = it.nextResource( );
// Get all the properties of the current subject.
StmtIterator statements = subject.listProperties( );
while (statements.hasNext( )){
// This statement is a triple (subject, predicate, and object)
Statement s = statements.nextStatement( );
}
}

Now that you have access to the triples in the file, storing the FOAF
data comes down to a basic series of if
statements. Each time a new subject is
encountered, you create an instance of the Person class. For each of
the properties of the subject, you will check the URI of the
predicate and, if it is a FOAF property, add the value to the proper
Vector in the Person's Hashtable.

while (it.hasNext( )) {
Resource subject = it.nextResource( );
// Create the person that this subject may represent.
Person p = new Person( );
boolean isPerson = false;
// Get all of the properties of the current subject.
StmtIterator statements = subject.listProperties( );
while (statements.hasNext( )){
// This statement is a triple: subject, predicate, and object.
Statement s = statements.nextStatement( );
// Check to see if this subject is actually a FOAF Person.
if(s.getPredicate( ).toString( ).equals(
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type") &&
s.getObject( ).toString( ).equals("http://xmlns.com/foaf/0.1/Person")) {
isPerson = true;
}
// Now check for each foaf property and add it.
String base = "http://xmlns.com/foaf/0.1/";
String key = s.getPredicate( ).toString( );
if (key.startsWith(base)) {
Vector v = (Vector) p.foafData.get(key.substring(base.length( )));
if (v == null) {
v = new Vector( );
p.foafData.put(key.substring(base.length( )), v);
}
v.add(s.getObject( ).toString( ));
}
} // End statement loop.

In the preceding example, the String base is
placed within the loop for clarity. Since it is always the same, that
line can easily be moved somewhere else in the code to prevent the
step of redeclaring the variable on each iteration.

There are two issues to address before adding this Person object,
p, to the Hashtable. First, on the Semantic Web,
much data can be included in a file. There is no requirement that a
FOAF file must contain only FOAF data. A file
may contain information about anything. As the
file is parsed, it is necessary to confirm that the object you are
parsing is actually a FOAF Person. If it turns out that the object
is, in fact, a FOAF Person, you must add it to a Hashtable that will
store all of the instances of your Person class. If it is
not a FOAF Person, you should just throw away
the Person object that you created. The following code makes use of
the foafHash declared previously:

if (isPerson) {
if (p.foafData.get("mbox")!=null)
for (int i = 0; i < ((Vector)p.foafData.get("mbox")).size( ); i++) {
String mail = (String) ((Vector)p.foafData.get("mbox")).elementAt(i);
if (foafHash.get(mail) != null && foafHash.get(mail) != p) {
merge(p, mail);
}
// Sometimes, people preface their mail address with mailto:
// We'll take it off to make the interface nicer.
if (mail.startsWith("mailto:"))
mail = mail.substring(7);
foafHash.put(mail, p);
}
if (p.foafData.get("mbox_sha1sum")!=null)
for (int i = 0; i < ((Vector)p.foafData.get("mbox_sha1sum")).size( ); i++) {
String mail = (String)
((Vector)p.foafData.get("mbox_sha1sum")).elementAt(i);
if (foafHash.get(mail) != null && foafHash.get(mail) != p) {
merge(p, mail);
}
foafHash.put(mail, p);
}
}
}

Notice that in both loops, before the instance of the Person class is
added to the Hashtable, the following logic is required:

        if (foafHash.get(key) != null && foafHash.get(key) != p) {
merge(p, key);
}

Because you may have already parsed information about this Person
somewhere else in the file and added an instance of the Person class
to the Hashtable, there may already be another instance of the class
with different information already stored. In this case, you need to
merge the data from the two Person objects. The if
statement checks to make sure that the stored Person object is
different from the current Person object to prevent unnecessarily
merging identical objects. The merge function will copy all of the
information into one object and then set the two objects equal to
each other.

 private static void merge(Person p, String mail) {
Person q = (Person) foafHash.get(mail);
for (Enumeration e = p.foafData.keys( ) ; e.hasMoreElements( ) ;) {
String curKey = (String)e.nextElement( );
// Go through each element in the names Vector.
for (int i = 0 ; i < ((Vector)q.foafData.get(curKey)).size( ); i++) {
String curVal = (String)
((Vector)q.foafData.get(curKey)).elementAt(i);
// Don't add a name to p if it's already there.
Vector psData = (Vector)p.foafData.get(curKey);
if (psData == null)
psData = new Vector( );
if (!psData.contains(curVal)){
// Add the value from q to p.
psData.add(curVal);
}
}
}
q = p;
}

This code completes the parsing of a single FOAF file. It may seem
complicated, but that is the bulk of everything that has to be done
to build this IRC bot. The next two steps take advantage of all of
this parsing with only a few more lines of code.


7.6.2 Crawling FOAF Files


FOAF is interesting because it creates a
social networkmany people are interconnected through linked
files. The previous code will parse a single file into the Hashtable,
but to collect FOAF data, it is necessary to crawl over files that
are linked together. This requires only a few additions. First, you
can use a Vector to store the URIs of files to parse:

Vector uris = new Vector( );
uris.add(inputFile);
while (uris.size( ) > 0) {
// Remove the first element in the Vector.
inputFile = (String) uris.remove(0);
/*
* Here, we insert the previous code that parses the file and builds
* our model. It is omitted from this example for brevity.
*/
}

You will parse each URL as outlined earlier. As you parse, one more
if statement will be required to check for
"see also" links. These links point
to other files. When encountered, these links will be added to the
Vector of URIs. The following should be added to the list of
if statements that checks for all of the other
FOAF properties:

if (s.getPredicate( ).toString( ).equals(
"http://www.w3.org/2000/01/rdf-schema#seeAlso")) {
uris.add(s.getObject( ).toString( ));
}

With these two small changes, the code will now crawl along the
semantic links in each file to parse every FOAF file connected to the
network!


The FOAF network is huge, and it will take
days to crawl through the whole lot. To get your
bot up and running quickly, consider skipping the crawl by
eliminating this last section of code and instead listing a handful
of FOAF files you want included in your bot's
database.


7.6.3 Writing the IRC Interface


Finally, once the previous code has been
executed, the Hashtable foafHash will contain all
of our Person objects with the correct information. That will take
place as an initialization step. The last step to complete FOAFBot is
to create the IRC bot interface. Since this is Java-based code, it
will use the PircBot API [Hack #35] . You can assume that the
onMessage method is overridden to accept input
from users in a channel. The rest of this step will just show how to
handle requests from users in this context.

Our Person class has all of the information from FOAF, but you can
decide which properties you want to be queriable through the IRC bot.
All of the people our bot knows about are indexed by email address or
email sha1sumthe result of applying the
SHA1 mathematical function to a mailto: identifier. For this reason,
you will require users to ask for information about a person via an
email address. The original FOAFBot also maintains a hash keyed by
IRC nickname, since that is easier to find on an IRC channel. To
support that, you would simply add another Hashtable to the preceding
code, and add Person objects to it by looping over the
nick Vector, just as with the email addresses. In
the email-indexed bot, a sample query might look like this:

foafbot, name of  golbeck@cs.umd.edu

Upon receiving this command, the bot looks up the address in the hash
to retrieve the associated Person object and then put together a
response with the information stored in the object:

StringTokenizer t = new StringTokenizer(message);
if (t.nextToken( ).toLowerCase( ).equals(
this.getName( ).toUpperCase( ).toLowerCase( ) + ",")) {
try {
String query = t.nextToken( );
if (query.equals("name")) {
t.nextToken( ); // Eliminate the "of".
String email = t.nextToken( );
Person p = (Person) foafHash.get(email);
String response = ";
Vector data = (Vector)p.foafData.get("name");
if (data!= null && data.size( ) > 0) {
response = email + " is named ";
for (int i = 0; i < data.size( ); i++) {
response += data.elementAt(i);
// This formats the response nicely with commas.
if (i + 1 < data.size( )) {
response += ", ";
}
}
}
else {
response = "I don't know the name of ";
response += email;
}
sendMessage(channel, response);
}
...

This is just one example of creating a response from the Person
object. You can decide which features of FOAF to support and how to
support them. With that, the FOAFBot is complete. This is not only an
interesting hack by itself, but it also lays the groundwork for any
other Semantic Web-based hacks. One of those, TrustBot [Hack #49], is next.


7.6.4 Running the Hack


In this hack, the
Foaf.java file contains a main method. Since the
bot is based on PircBot, you need to change that. By simply renaming
the Foaf.java main method to
an init method and calling that
init method as one of the first steps in the main
method of your PircBot-based bot, the FOAF data crawl will be
initialized and stored before the bot joins a channel.

With this change, the only step is to compile and run the bot as
usual (see [Hack #35] ). When the
bot joins a channel, it will process any requests that you wrote code
to handle.

A FOAFBot interface is demonstrated in Figure 7-5.


Figure 7-5. Using FOAFBot to find out about a user

Now you can use FOAFBot to find out about all the users in your
channel.

Jennifer Golbeck


/ 175