Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Mastering Perl for Bioinformatics [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید












1.9 CPAN Modules



The
Comprehensive Perl Archive Network
(CPAN, http://www.cpan.org) is an
impressively large collection of Perl code (mostly Perl modules).
CPAN is easily accessible and searchable on the Web, and you can use
its modules for a variety of programming tasks.


By now you should have the basic idea of how modules are defined and
used, so let's take some time to explore CPAN to see
what goodies are available.


There are two important points about CPAN. First, a large number of
the things you might want your programs to do have already been
programmed and are easily obtained in downloadable modules. You just
have to go find them at CPAN, install them on your computer, and call
them from your program. We'll take a look at an
example of exactly that in this section.


Second, all code on CPAN is free of charge and available for use by a
very unrestrictive copyright declaration. Sound good? Keep reading.


CPAN includes convenient ways to search for useful modules, and
there's a CPAN.pm module built-in
with Perl that makes downloading and installing modules quite easy
(when things work well, which they usually do). If you
can't find CPAN.pm, you should
consider updating your current version.


You can find more information by typing the following at the command
line:


perldoc CPAN


You can also check the Frequently Asked Questions (FAQ) available at
the CPAN web site.



1.9.1 What's Available at CPAN?



The CPAN web site offers several
"views" of the CPAN collection of
modules and several alternate ways of searching (by module name,
category, full text search of the module documentation, etc.). Here
is the top-level organization of the modules by overall category:


Development Support
Operating System Interfaces
Networking Devices IPC
Data Type Utilities
Database Interfaces
User Interfaces
Language Interfaces
File Names Systems Locking
String Lang Text Proc
Opt Arg Param Proc
Internationalization Locale
Security and Encryption
World Wide Web HTML HTTP CGI
Server and Daemon Utilities
Archiving and Compression
Images Pixmaps Bitmaps
Mail and Usenet News
Control Flow Utilities
File Handle Input Output
Microsoft Windows Modules
Miscellaneous Modules
Commercial Software Interfaces
Not In Modulelist


1.9.2 Searching CPAN



CPAN's main web page has a few ways to search the
contents. Let's say you need to perform some
statistics and are looking for code that's already
available. We'll go through the steps necessary to
search for the code, download and install it, and use the module in a
program.


At the main CPAN page, look for
"searching" and click on
search.cpan.org. If you search for
"statistics" in all locations,
you'll get over 300 hits, so you should restrict
your search to modules with the pull-down menu.
You'll get 25 hits (more by the time you read this);
here's what you'll see:


1.  Statistics::Candidates
Statistics-MaxEntropy-0.9 - 26 Nov 1998 - Hugo WL ter Doest
2. Statistics::ChiSquare
How random is your data?
Statistics-ChiSquare-0.3 - 23 Nov 2001 - Jon Orwant
3. Statistics::Contingency
Calculate precision, recall, F1, accuracy, etc.
Statistics-Contingency-0.03 - 09 Aug 2002 - Ken Williams
4. Statistics::DEA
Discontiguous Exponential Averaging
Statistics-DEA-0.04 - 17 Aug 2002 - Jarkko Hietaniemi
5. Statistics::Descriptive
Module of basic descriptive statistical functions.
Statistics-Descriptive-2.4 - 26 Apr 1999 - Colin Kuskie
6. Statistics::Distributions
Perl module for calculating critical values of common statistical distributions
Statistics-Distributions-0.07 - 22 Jun 2001 - Michael Kospach
7. Statistics::Frequency
simple counting of elements
Statistics-Frequency-0.02 - 24 Apr 2002 - Jarkko Hietaniemi
8. Statistics::GaussHelmert
General weighted least squares estimation
Statistics-GaussHelmert-0.05 - 18 Apr 2002 - Stephan Heuel
9. Statistics::LTU
An implementation of Linear Threshold Units
Statistics-LTU-2.8 - 27 Feb 1997 - Tom Fawcett
10. Statistics::Lite
Small stats stuff.
Statistics-Lite-1.02 - 15 Apr 2002 - Brian Lalonde
11. Statistics::MaxEntropy
Statistics-MaxEntropy-0.9 - 26 Nov 1998 - Hugo WL ter Doest
12. Statistics::OLS
perform ordinary least squares and associated statistics, v 0.07.
Statistics-OLS-0.07 - 13 Oct 2000 - Sanford Morton
13. Statistics::ROC
receiver-operator-characteristic (ROC) curves with nonparametric confidence bounds
Statistics-ROC-0.01 - 22 Jul 1998 - Hans A. Kestler
14. Statistics::Regression
weighted linear regression package (line+plane fitting)
StatisticsRegression - 26 May 2001 - ivo welch
15. Statistics::SparseVector
Perl5 extension for manipulating sparse bitvectors
Statistics-MaxEntropy-0.9 - 26 Nov 1998 - Hugo WL ter Doest
16. Statistics::Descriptive::Discrete
Compute descriptive statistics for discrete data sets.
Statistics-Descriptive-Discrete-0.07 - 13 Jun 2002 - Rhet Turnbull
17. Bio::Tree::Statistics
Calculate certain statistics for a Tree
bioperl-1.0.2 - 16 Jul 2002 - Ewan Birney
18. Device::ISDN::OCLM::Statistics
OCLM statistics superclass
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes
19. Device::ISDN::OCLM::CurrentStatistics
OCLM current call statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes
20. Device::ISDN::OCLM::ISDNStatistics
OCLM ISDN statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes
21. Device::ISDN::OCLM::Last10Statistics
OCLM Last10 call statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes
22. Device::ISDN::OCLM::LastStatistics
OCLM last call statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes
23. Device::ISDN::OCLM::ManualStatistics
OCLM manual call statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes
24. Device::ISDN::OCLM::SPStatistics
OCLM service provider statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes
25. Device::ISDN::OCLM::SystemStatistics
OCLM system statistics
Device-ISDN-OCLM-0.40 - 02 Jan 2000 - Merlin Hughes


Let's check out the Statistics::ChiSquare
module.


First, click on the link to Statistics::ChiSquare;
you'll see a summary of the module, complete with a
description, overview, discussion of the method, examples of use, and
information about the author.


One of the modules looks interesting; let's download
and install it. How big is the source code? If you click on the
source link, you'll find that the
module is really just one short subroutine with the documentation
defined right in the module. Here's the subroutine
definition part of the module:


package Statistics::ChiSquare;
# ChiSquare.pm
#
# Jon Orwant, orwant@media.mit.edu
#
# 31 Oct 95, revised Mon Oct 18 12:16:47 1999, and again November 2001
# to fix an off-by-one error
#
# Copyright 1995, 1999, 2001 Jon Orwant. All rights reserved.
# This program is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
#
# Version 0.3. Module list status is "Rdpf"
use strict;
use vars qw($VERSION @ISA @EXPORT);
require Exporter;
require AutoLoader;
@ISA = qw(Exporter AutoLoader);
# Items to export into callers namespace by default. Note: do not export
# names by default without a very good reason. Use EXPORT_OK instead.
# Do not simply export all your public functions/methods/constants.
@EXPORT = qw(chisquare);
$VERSION = '0.3';
my @chilevels = (100, 99, 95, 90, 70, 50, 30, 10, 5, 1);
my %chitable = ( );
# assume the expected probability distribution is uniform
sub chisquare {
my @data = @_;
@data = @{$data[0]} if @data = = 1 and ref($data[0]);
my $degrees_of_freedom = scalar(@data) - 1;
my ($chisquare, $num_samples, $expected, $i) = (0, 0, 0, 0);
if (! exists($chitable{$degrees_of_freedom})) {
return "I can't handle ", scalar(@data),
" choices without a better table.";
}
foreach (@data) { $num_samples += $_ }
$expected = $num_samples / scalar(@data);
return "There's no data!" unless $expected;
foreach (@data) {
$chisquare += (($_ - $expected) ** 2) / $expected;
}
foreach (@{$chitable{$degrees_of_freedom}}) {
if ($chisquare < $_) {
return
"There's a <$chilevels[$i+1]% and <$chilevels[$i]% chance that this data
is random.";
}
$i++;
}
return "There's a <$chilevels[$#chilevels]% chance that this data is random.";
}
$chitable{1} = [0.00016, 0.0039, 0.016, 0.15, 0.46, 1.07, 2.71, 3.84, 6.64];
$chitable{2} = [0.020, 0.10, 0.21, 0.71, 1.39, 2.41, 4.60, 5.99, 9.21];
$chitable{3} = [0.12, 0.35, 0.58, 1.42, 2.37, 3.67, 6.25, 7.82, 11.34];
$chitable{4} = [0.30, 0.71, 1.06, 2.20, 3.36, 4.88, 7.78, 9.49, 13.28];
$chitable{5} = [0.55, 1.14, 1.61, 3.00, 4.35, 6.06, 9.24, 11.07, 15.09];
$chitable{6} = [0.87, 1.64, 2.20, 3.83, 5.35, 7.23, 10.65, 12.59, 16.81];
$chitable{7} = [1.24, 2.17, 2.83, 4.67, 6.35, 8.38, 12.02, 14.07, 18.48];
$chitable{8} = [1.65, 2.73, 3.49, 5.53, 7.34, 9.52, 13.36, 15.51, 20.09];
$chitable{9} = [2.09, 3.33, 4.17, 6.39, 8.34, 10.66, 14.68, 16.92, 21.67];
$chitable{10} = [2.56, 3.94, 4.86, 7.27, 9.34, 11.78, 15.99, 18.31, 23.21];
$chitable{11} = [3.05, 4.58, 5.58, 8.15, 10.34, 12.90, 17.28, 19.68, 24.73];
$chitable{12} = [3.57, 5.23, 6.30, 9.03, 11.34, 14.01, 18.55, 21.03, 26.22];
$chitable{13} = [4.11, 5.89, 7.04, 9.93, 12.34, 15.12, 19.81, 22.36, 27.69];
$chitable{14} = [4.66, 6.57, 7.79, 10.82, 13.34, 16.22, 21.06, 23.69, 29.14];
$chitable{15} = [5.23, 7.26, 8.55, 11.72, 14.34, 17.32, 22.31, 25.00, 30.58];
$chitable{16} = [5.81, 7.96, 9.31, 12.62, 15.34, 18.42, 23.54, 26.30, 32.00];
$chitable{17} = [6.41, 8.67, 10.09, 13.53, 16.34, 19.51, 24.77, 27.59, 33.41];
$chitable{18} = [7.00, 9.39, 10.87, 14.44, 17.34, 20.60, 25.99, 28.87, 34.81];
$chitable{19} = [7.63, 10.12, 11.65, 15.35, 18.34, 21.69, 27.20, 30.14, 36.19];
$chitable{20} = [8.26, 10.85, 12.44, 16.27, 19.34, 22.78, 28.41, 31.41, 37.57];
1;


Some of this code will look familiar; some may not. Check out the use
of package, use
strict, and require
Exporter; they're parts of Perl
you've just seen.


You'll also see references to
version, Autoloader,
use vars, and an initialization
of a multidimensional array chitable, which will
be covered later. For now, you may want to take a quick read-through
of the code and get some personal satisfaction at how much of it
makes sense.


Indeed, one of the really nice things about most modules is that you
don't really have to read the code very often.
Usually you can just install the module, read enough of the
documentation to see how to call it from your program, and
you're off and running. Let's take
that approach now.



1.9.3 Installing Modules Using CPAN.pm



Our next task is to install the module using
CPAN.pm. This section contains a log from when I
installed Statistics::ChiSquare on my Linux
computer using CPAN.pm.


In fact, to make things easy, here's the section of
the CPAN FAQ that addresses installing modules:


How do I install Perl modules?
Installing a new module can be as simple as typing
perl -MCPAN -e 'install Chocolate::Belgian'.
The CPAN.pm documentation has more complete instructions on how to use
this convenient tool. If you are uncomfortable with having something
take that much control over your software installation, or it otherwise
doesn't work for you, the perlmodinstall documentation covers
module installation for UNIX, Windows and Macintosh in more familiar terms.
Finally, if you're using ActivePerl on Windows, the PPM (Perl Package Manager)
has much of the same functionality as CPAN.pm.


The following is my install log. Notice that all I have to do is type
a couple of lines, and everything else that follows is automatic!


[tisdall@coltrane tisdall]$ perl -MCPAN -e 'install Statistics::ChiSquare'
CPAN: Storable loaded ok
mkdir /root/.cpan: Permission denied at /usr/local/lib/perl5/5.6.1/CPAN.pm line 2218
[tisdall@coltrane tisdall]$ su
Password:
[root@coltrane tisdall]# perl -MCPAN -e 'install Statistics::ChiSquare'
CPAN: Storable loaded ok
Going to read /root/.cpan/Metadata
Database was generated on Wed, 20 Mar 2002 00:39:29 GMT
CPAN: LWP::UserAgent loaded ok
Fetching with LWP:
ftp://cpan.cse.msu.edu/authors/01mailrc.txt.gz
Going to read /root/.cpan/sources/authors/01mailrc.txt.gz
CPAN: Compress::Zlib loaded ok
Fetching with LWP:
ftp://cpan.cse.msu.edu/modules/02packages.details.txt.gz
Going to read /root/.cpan/sources/modules/02packages.details.txt.gz
Database was generated on Mon, 26 Aug 2002 00:22:07 GMT
There's a new CPAN.pm version (v1.62) available!
[Current version is v1.59_54]
You might want to try
install Bundle::CPAN
reload cpan
without quitting the current session. It should be a seamless upgrade
while we are running...
Fetching with LWP:
ftp://cpan.cse.msu.edu/modules/03modlist.data.gz
Going to read /root/.cpan/sources/modules/03modlist.data.gz
Going to write /root/.cpan/Metadata
Running install for module Statistics::ChiSquare
Running make for J/JO/JONO/Statistics-ChiSquare-0.3.tar.gz
Fetching with LWP:
ftp://cpan.cse.msu.edu/authors/id/J/JO/JONO/Statistics-ChiSquare-0.3.tar.gz
CPAN: MD5 loaded ok
Fetching with LWP:
ftp://cpan.cse.msu.edu/authors/id/J/JO/JONO/CHECKSUMS
Checksum for /root/.cpan/sources/authors/id/J/JO/JONO/Statistics-ChiSquare-0.3.
tar.gz ok
Scanning cache /root/.cpan/build for sizes
Deleting from cache: /root/.cpan/build/IO-stringy-2.108 (21.4>20.0 MB)
Deleting from cache: /root/.cpan/build/XML-Node-0.11 (20.8>20.0 MB)
Deleting from cache: /root/.cpan/build/bioperl-0.7.2 (20.7>20.0 MB)
Statistics/ChiSquare-0.3/
Statistics/ChiSquare-0.3/ChiSquare.pm
Statistics/ChiSquare-0.3/Makefile.PL
Statistics/ChiSquare-0.3/test.pl
Statistics/ChiSquare-0.3/Changes
Statistics/ChiSquare-0.3/MANIFEST
Package seems to come without Makefile.PL.
(The test -f "/root/.cpan/build/Statistics/Makefile.PL" returned false.)
Writing one on our own (setting NAME to StatisticsChiSquare)
CPAN.pm: Going to build J/JO/JONO/Statistics-ChiSquare-0.3.tar.gz
Checking if your kit is complete...
Looks good
Writing Makefile for Statistics::ChiSquare
Writing Makefile for StatisticsChiSquare
make[1]: Entering directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
cp ChiSquare.pm ../blib/lib/Statistics/ChiSquare.pm
AutoSplitting ../blib/lib/Statistics/ChiSquare.pm (../blib/lib/auto/
Statistics/ChiSquare)
Manifying ../blib/man3/Statistics::ChiSquare.3
make[1]: Leaving directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
/usr/bin/make -- OK
Running make test
make[1]: Entering directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
make[1]: Leaving directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
make[1]: Entering directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
PERL_DL_NONLAZY=1 /usr/bin/perl -I../blib/arch -I../blib/lib -I/usr/local/lib/
perl5/5.6.1/i686-linux -I/usr/local/lib/perl5/5.6.1 test.pl
1..2
ok 1
ok 2
make[1]: Leaving directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
/usr/bin/make test -- OK
Running make install
make[1]: Entering directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
make[1]: Leaving directory `/root/.cpan/build/Statistics/ChiSquare-0.3'
Installing /usr/local/lib/perl5/site_perl/5.6.1/Statistics/ChiSquare.pm
Installing /usr/local/lib/perl5/site_perl/5.6.1/auto/Statistics/ChiSquare/
autosplit.ix
Installing /usr/local/man/man3/Statistics::ChiSquare.3
Writing /usr/local/lib/perl5/site_perl/5.6.1/i686-linux/auto/
StatisticsChiSquare/.packlist
Appending installation info to /usr/local/lib/perl5/5.6.1/i686-linux/perllocal.pod
/usr/bin/make install UNINST=1 -- OK
[root@coltrane tisdall]#


This may seem like a confusing amount of output, but, again, all you
have to do is type a couple of lines, and the installation follows
automatically.


You may get something like the following message when you try to
install a CPAN module:


[tisdall@coltrane tisdall]$ perl -MCPAN -e 'install Statistics::ChiSquare'
CPAN: Storable loaded ok
mkdir /root/.cpan: Permission denied at /usr/local/lib/perl5/5.6.1/CPAN.pm line 2218


As you can see, it didn't work, and it produced an
error message. On Unix machines, it's often
necessary to become root to install things.[2] In that case, use the Unix
su command and try the CPAN command again:



[2] You may
need to contact your system administrator about getting root
permission. The CPAN documentation discusses how to do a non-root
installation. If you're not on a Unix or Linux
machine and are using ActiveState's Perl on a
Windows machine, for instance, you need to consult that
documentation.



[tisdall@coltrane tisdall]$ su
Password:
[root@coltrane tisdall]# perl -MCPAN -e 'install Statistics::ChiSquare'


Great, it worked. If you look over the rather verbose output,
you'll see that it finds the module, installs it,
tests it, and logs the installation.


Pretty easy, huh?


It's usually this easy, but not always.
Occasionally, errors result, and the module may not be installed. In
that case, the error messages may be enough to explain the problem;
for instance, the module may depend on another module you have to
install first. Another problem is that some modules
haven't been tested on, or even designed to work on,
all operating systems; if you try to install a Windows-specific
module on Linux, it is likely to complain. In extreme cases, the
module documentation usually provides the author's
email address.



1.9.4 Using the Newly Installed CPAN Module



Now comes the payoff. Let's look again at the
documentation for the module and see if we can use it from our own
Perl code.


Now that the module is installed, you can see the documentation by
typing:


perldoc Statistics::ChiSquare


You can also simply go back to the web documentation found at
http://search.cpan.org. Either
way, you'll find the following example using this
ChiSquare module:


NAME
"Statistics::ChiSquare" - How random is your data?
SYNOPSIS
use Statistics::Chisquare;
print chisquare(@array_of_numbers);
Statistics::ChiSquare is available at a CPAN site near
you.
DESCRIPTION
Suppose you flip a coin 100 times, and it turns up heads
70 times. Is the coin fair?
Suppose you roll a die 100 times, and it shows 30 sixes.
Is the die loaded?
In statistics, the chi-square test calculates "how random"
a series of numbers is. But it doesn't simply say "yes"
or "no". Instead, it gives you a confidence interval,
which sets upper and lower bounds on the likelihood that
the variation in your data is due to chance. See the
examples below.
...


The documentation continues with more discussion and some concrete
examples that use the module and interpret the results.


Very often, the SYNOPSIS part of the documentation
is all you need to look at. It shows you specific examples of how to
call the code in the module. In this case, because
it's a very simple module, there is just one
subroutine that can be used. As you see from the documentation
excerpt, you just need to pass the chisquare
subroutine an array of numbers and print out the return value to use
the code. Let's try it. We'll take
as our input an array of numbers that corresponds to the stops of the
Broadway-7th Avenue local subway train on the west side of Manhattan,
from 14th Street up to 137th Street in Harlem.
(We'll assume you didn't run fast
enough and missed the A train.) Let's see how random
these stops really are:


use strict;
use warnings;
use Statistics::ChiSquare;
my(@subwaystops) = (14, 18, 23, 28, 34, 42, 50, 59, 66, 72, 79, 86, 96, 103, 110,
116, 125, 137);
print chisquare(@subwaystops);


This produces the output:


There's a <1% chance that this data is random.


(Knowing firsthand the feelings of long-suffering New York City
Subway riders, I predict that this result might provoke some spirited
discussion. Nevertheless, we seem to have working code.)



1.9.5 Problems with CPAN Modules



Actually, the sharp-eyed reader may have noticed a problem in our mad
dash uptown. In the first line of the SYNOPSIS
section, there's the following:


use Statistics::Chisquare;


The name of the module is spelled Chisquare, whereas in all other
places in the documentation the module is spelled ChiSquare with a
capital S. In Perl, the case of a letter, uppercase or lowercase, is
important, and this looks suspiciously like a typographical error in
the documentation. If you try use
Statistics::Chisquare
, you'll discover
that the module can't be found, whereas if you try
use Statistics::ChiSquare, the module is there.
This is a minor bug, but some modules have poor documentation, and it
can be a time-consuming problem, especially if you are forced to wade
into the module code or try various tests, to figure out how the
module works.


Apart from bugs, I've also mentioned the problem
that some modules are not tested, or designed, for all operating
systems. In addition, many modules require other modules to be
present. It's possible to configure CPAN to
automatically install all the required modules a requested module
uses, as described in the CPAN documentation, but you may need to
intervene personally. It's useful to remember that
if you have a program that uses a certain module running on one
computer, and you move the program to another computer, you may have
to install the required modules on the new computer as well.


Saving the worst for last, it's also important to
remember that contributing to CPAN is open to one and all, and not
all the code there is well-written or well-tested. The heavily used
modules are, but counterexamples can be found. So,
don't bet the farm on your code just because it uses
a CPAN module; you should still carefully read the documentation for
the module and test your program.


The CPAN FAQ explains in detail the way to be a good citizen when it
comes to testing and reporting bugs that you discover in CPAN
code.



/ 156