Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Mastering Perl for Bioinformatics [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید












9.6 bptutorial.pl



I've already shown you a little of the
bptutorial.pl
document. I ran and discussed a few of the short example programs in
the preceding sections.


As you know, one of the easiest ways to get started with a
programming system is to find some working and fairly generic
programs in that system. You can read and run the programs, and then
proceed to alter them using them as templates for your own
programming development.


Bioperl comes with a directory of example programs, but the best
place to begin looking for starting-off program code is right in the
bptutorial.pl document itself. That
.pl suffix on the name is the giveaway; the
document is actually itself a program, cleverly designed so that you
can read, and run, example programs that exercise the core parts of
the Bioperl project.


The following explanation of the runnable programs that are part of
bptutorial.pl appears at the end of the document
(when you view it on the Web or as the output of
perldoc bptutorial.pl).


       V.2 Appendix: Tutorial demo scripts
The following scripts demonstrate many of the features of
bioperl. To run all the core demos, run:
> perl -w bptutorial.pl 0
To run a subset of the scripts do
> perl -w bptutorial.pl
and use the displayed help screen.
It may be best to start by just running one or two demos
at a time. For example, to run the basic sequence manipu-
lation demo, do:
> perl -w bptutorial.pl 1
Some of the later demos require that you have an internet
connection and/or that you have an auxilliary bioperl
library and/or external cpan module and/or external pro-
gram installed. They may also fail if you are not running
under Linux or Unix. In all of these cases, the script
should fail "gracefully" simply saying the demo is being
skipped. However if the script "crashes", simply run the
other demos individually (and perhaps send an email to
bioperl-l@bioperl.org detailing the problem :-).


(Recall that the -w flag to Perl turns on warnings
in almost the same manner as a use
warnings; directive.)


To test my Bioperl installation, I started by running the basic
sequence manipulation demo as suggested.


First, I thought I might copy the bptutorial.pl
program file into my own working directory from the Bioperl
distribution directory where I'd unpacked the source
code. I wanted to put it in my own directory so as not to muddy up
the Bioperl distribution directory with my own extraneous files.
However, I discovered that the tutorial demo programs rely on a
number of datafiles that are found in the t/data/
subdirectory of the Bioperl distribution. Running the
program in my own directory gave me an error because the program
evidently requires a FASTA-formatted file called
dna1.fa:


[tisdall]$ perl -w bptutorial.pl 1
Beginning sequence_manipulations and SeqIO example...
------------- EXCEPTION -------------
MSG: Could not open t/data/dna1.fa: No such file or directory
STACK Bio::Root::IO::_initialize_io /usr/local/lib/perl5/site_perl/5.8.0/Bio/Root/IO.
pm:264
STACK Bio::SeqIO::_initialize /usr/local/lib/perl5/site_perl/5.8.0/Bio/SeqIO.pm:437
STACK Bio::SeqIO::fasta::_initialize /usr/local/lib/perl5/site_perl/5.8.0/Bio/SeqIO/
fasta.pm:80
STACK Bio::SeqIO::new /usr/local/lib/perl5/site_perl/5.8.0/Bio/SeqIO.pm:355
STACK Bio::SeqIO::new /usr/local/lib/perl5/site_perl/5.8.0/Bio/SeqIO.pm:368
STACK main::_ _ANON_ _ bptutorial.pl:2758
STACK toplevel bptutorial.pl:3933
--------------------------------------
[tisdall]$


I decided it would be easier to run bptutorial.pl
from the distribution directory. I entered that directory; on my
Linux machine, I unpacked the Bioperl source code into
/usr/local/src/bioperl-1.2.1. I then tried to
run the demo:


[tisdall]$ cd /usr/local/src/bioperl-1.2.1
[tisdall]$ perl -w bptutorial.pl 1
Beginning sequence_manipulations and SeqIO example...
First sequence in fasta format...
>Test1
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC
TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG
TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC
ACAACATCCATGAAACGCATTAGCACCACC
Seq object display id is Test1
Sequence is
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTAC
CTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATT
ACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
Sequence from 5 to 10 is TTTCAT
Acc num is unknown
Moltype is dna
Primary id is Test1
Truncated Seq object sequence is TTTCAT
Reverse complemented sequence 5 to 10 is ATGAAA
Translated sequence 6 to 15 is LQRAICLCVD
Beginning 3-frame and alternate codon translation example...
ctgagaaaataa translated using method defaults : LRK*
ctgagaaaataa translated as a coding region (CDS): MRK
Translating in all six frames:
frame: 0 forward: LRK*
frame: 0 reverse-complement: LFSQ
frame: 1 forward: *EN
frame: 1 reverse-complement: YFL
frame: 2 forward: EKI
frame: 2 reverse-complement: IFS
Translating with all codon tables using method defaults:
1 : LRK*
2 : L*K*
3 : TRK*
4 : LRK*
5 : LSK*
6 : LRKQ
9 : LSN*
10 : LRK*
11 : LRK*
12 : SRK*
13 : LGK*
14 : LSNY
15 : LRK*
16 : LRK*
21 : LSN*
[tisdall]$


That seemed to run without error. Now I wanted to see the Perl code
that had run and produced that output.


Exploring a bit, I found that the POD documentation part of
bptutorial.pl is roughly the first half of the
file, and that the second half of the file contains the Perl code for
the demos.


The author attributions and the libraries that are loaded are at the
beginning of the Perl code section, somewhere near the middle of the
bptutorial.pl file:


#!/usr/bin/perl
# PROGRAM : bptutorial.pl
# PURPOSE : Demonstrate various uses of the bioperl package
# AUTHOR : Peter Schattner schattner@alum.mit.edu
# CREATED : Dec 15 2000
# REVISION : $Id: ch09.xml,v 1.2 2003/10/03 19:16:55 becki Exp chodacki $
use strict;
use Bio::SimpleAlign;
use Bio::AlignIO;
use Bio::SeqIO;
use Bio::Seq;


That seems like a good bet for a list of the most important Bioperl
modules. Actually, some other modules are loaded for individual
demos; at various points, the following come into play:


use Bio::SearchIO;
use Bio::Root::IO;
use Bio::MapIO;
use Bio::TreeIO;
use Bio::Perl


Following those use Module
directives comes the following:


# subroutine references
my ($access_remote_db, $index_local_db, $fetch_local_db,
$sequence_manipulations, $seqstats_and_seqwords,
$restriction_and_sigcleave, $other_seq_utilities, $run_remoteblast,
$run_standaloneblast, $blast_parser, $bplite_parsing, $hmmer_parsing,
$run_clustalw_tcoffee, $run_psw_bl2seq, $simplealign,
$gene_prediction_parsing, $sequence_annotation, $largeseqs,
$run_tree, $run_map, $run_struct, $run_perl, $searchio_parsing,
$liveseqs, $demo_variations, $demo_xml, $display_help, $bpinspect1 );
# global variable file names. Edit these if you want to try
#out a tutorial script on a different file
Bio::Root::IO->catfile("t","data","ecolitst.fa");
# used in $sequence_manipulations
my $dna_seq_file = Bio::Root::IO->catfile("t","data","dna1.fa");
# used in $other_seq_utilities and in $run_perl and $sequence_annotation
my $amino_seq_file = Bio::Root::IO->catfile("t","data","cysprot1.fa");
# used in $blast_parser
my $blast_report_file = Bio::Root::IO->catfile("t","data","blast.report");
# used in $bplite_parsing
my $bp_parse_file1 = Bio::Root::IO->catfile("t","data","blast.report");
# used in $bplite_parsing
my $bp_parse_file2 = Bio::Root::IO->catfile("t","data","psiblastreport.out");
# used in $bplite_parsing
my $bp_parse_file3 = Bio::Root::IO->catfile("t","data","bl2seq.out");
# used in $run_clustalw_tcoffee
my $unaligned_amino_file = Bio::Root::IO->catfile("t","data","cysprot1a.fa");
# used in $simplealign
my $aligned_amino_file = Bio::Root::IO->catfile("t","data","testaln.pfam");
# other global variables
my (@runlist, $n );


A look at the documentation (perldoc
Bio::Root::IO) shows that the catfile method
returns the pathname of a file, which is being saved in a scalar
variable such as $dna_seq_file. This method is
there for portability between operating systems, because operating
systems each have their own syntax for specifying pathnames.


After that comes the code for the help screen, the output of which
you've already seen; next comes the individual demo
subroutines; and finally the code for the main subroutine, which you
saw in the last section.


At the very end of the bptutorial.pl file, I found
the part of the code that launches all the demos:


## "main" program follows
#no strict 'refs';
@runlist = @ARGV;
# display help if no option
if (scalar(@runlist)= =0) {&$display_help;};
# argument = 0 means run tests 1 thru 22
if ($runlist[0] = = 0) {@runlist = (1..22); };
foreach $n (@runlist) {
if ($n = =100) {my $object = $runlist[1]; &$bpinspect1($object); last;}
if ($n = =1) {&$sequence_manipulations; next;}
if ($n = =2) {&$seqstats_and_seqwords; next;}
if ($n = =3) {&$restriction_and_sigcleave; next;}
if ($n = =4) {&$other_seq_utilities; next;}
if ($n = =5) {&$run_perl; next;}
if ($n = =6) {&$searchio_parsing; next;}
if ($n = =7) {&$bplite_parsing; next;}
if ($n = =8) {&$hmmer_parsing; next;}
if ($n = =9) {&$simplealign ; next;}
if ($n = =10) {&$gene_prediction_parsing; next;}
if ($n = =11) {&$access_remote_db; next;}
if ($n = =12) {&$index_local_db; next;}
if ($n = =13) {&$fetch_local_db; next;}
if ($n = =14) {&$sequence_annotation; next;}
if ($n = =15) {&$largeseqs; next;}
if ($n = =16) {&$liveseqs; next;}
if ($n = =17) {&$run_struct; next;}
if ($n = =18) {&$demo_variations; next;}
if ($n = =19) {&$demo_xml; next;}
if ($n = =20) {&$run_tree; next;}
if ($n = =21) {&$run_map; next;}
if ($n = =22) {&$run_remoteblast; next;}
if ($n = =23) {&$run_standaloneblast; next;}
if ($n = =24) {&$run_clustalw_tcoffee; next;}
if ($n = =25) {&$run_psw_bl2seq; next;}
&$display_help;
}
## End of "main"


So, I searched for sequence_manipulation and found
the following code for this, the first of the
bptutorial.pl demos:


#################################################
# sequence_manipulations ( ):
#
$sequence_manipulations = sub {
my ($infile, $in, $out, $seqobj);
$infile = $dna_seq_file;
print "\nBeginning sequence_manipulations and SeqIO example... \n";
# III.3.1 Transforming sequence files (SeqIO)
$in = Bio::SeqIO->new('-file' => $infile ,
'-format' => 'Fasta');
$seqobj = $in->next_seq( );
# perl "tied filehandle" syntax is available to SeqIO,
# allowing you to use the standard <> and print operations
# to read and write sequence objects, eg:
#$out = Bio::SeqIO->newFh('-format' => 'EMBL');
$out = Bio::SeqIO->newFh('-format' => 'fasta');
print "First sequence in fasta format... \n";
print $out $seqobj;
# III.4 Manipulating individual sequences
# The following methods return strings
print "Seq object display id is ",
$seqobj->display_id( ), "\n"; # the human read-able id of the sequence
print "Sequence is ",
$seqobj->seq( )," \n"; # string of sequence
print "Sequence from 5 to 10 is ",
$seqobj->subseq(5,10)," \n"; # part of the sequence as a string
print "Acc num is ",
$seqobj->accession_number( ), " \n"; # when there, the accession number
print "Moltype is ",
$seqobj->alphabet( ), " \n"; # one of 'dna','rna','protein'
print "Primary id is ", $seqobj->primary_seq->primary_id( )," \n";
# a unique id for this sequence irregardless
#print "Primary id is ", $seqobj->primary_id( ), " \n";
# a unique id for this sequence irregardless
# of its display_id or accession number
# The following methods return an array of Bio::SeqFeature objects
$seqobj->top_SeqFeatures; # The 'top level' sequence features
$seqobj->all_SeqFeatures; # All sequence features, including sub
# seq features
# The following methods returns new sequence objects,
# but do not transfer features across
# truncation from 5 to 10 as new object
print "Truncated Seq object sequence is ",
$seqobj->trunc(5,10)->seq( ), " \n";
# reverse complements sequence
print "Reverse complemented sequence 5 to 10 is ",
$seqobj->trunc(5,10)->revcom->seq, " \n";
# translation of the sequence
print "Translated sequence 6 to 15 is ",
$seqobj->translate->subseq(6,15), " \n";
my $c = shift;
$c ||= 'ctgagaaaataa';
print "\nBeginning 3-frame and alternate codon translation example... \n";
my $seq = new Bio::PrimarySeq('-SEQ' => $c,
'-ID' => 'no.One');
print "$c translated using method defaults : ",
$seq->translate->seq, "\n";
# Bio::Seq uses same sequence methods as PrimarySeq
my $seq2 = new Bio::Seq('-SEQ' => $c, '-ID' => 'no.Two');
print "$c translated as a coding region (CDS): ",
$seq2->translate(undef, undef, undef, undef, 1)->seq, "\n";
print "\nTranslating in all six frames:\n";
my @frames = (0, 1, 2);
foreach my $frame (@frames) {
print " frame: ", $frame, " forward: ",
$seq->translate(undef, undef, $frame)->seq, "\n";
print " frame: ", $frame, " reverse-complement: ",
$seq->revcom->translate(undef, undef, $frame)->seq, "\n";
}
print "Translating with all codon tables using method defaults:\n";
my @codontables = qw( 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 );
foreach my $ct (@codontables) {
print $ct, " : ",
$seq->translate(undef, undef, undef, $ct)->seq, "\n";
}
return 1;
} ;
#################################################


/ 156