Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Mastering Perl for Bioinformatics [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید












9.7 bptutorial.pl: sequence_manipulation Demo



In this section, I'll go through the code for the
demo subroutine sequence_manipulation that was
shown in the last section.


The subroutine is actually an anonymous subroutine; a reference to
the subroutine is saved in the scalar reference variable
$sequence_manipulation:


$sequence_manipulations = sub {
...
}


The first few lines of code declare some variables with
my. Notice that these are not being passed in as
arguments; this method uses no arguments but does occasionally use
global variables such as $dna_seq_file, which, as
you've just seen, contain the pathname of the input
sequence file the demo will use:


my ($infile, $in, $out, $seqobj);
$infile = $dna_seq_file;
print "\nBeginning sequence_manipulations and SeqIO example... \n";


The code is cross-referenced to the tutorial sections of the file.
The next comment line refers to the part of the document:


# III.3.1 Transforming sequence files (SeqIO)


which can be looked up in the table of contents to the document for
further reading:


III.3 Manipulating sequences
III.3.1 Manipulating sequence data with Seq methods (Seq)


Now, I'll take a look at the first section of
example code in the sequence_manipulations method:


# III.3.1 Transforming sequence files (SeqIO)
$in = Bio::SeqIO->new('-file' => $infile ,'-format' => 'Fasta');
$seqobj = $in->next_seq( );
# perl "tied filehandle" syntax is available to SeqIO,
# allowing you to use the standard <> and print operations
# to read and write sequence objects, eg:
#$out = Bio::SeqIO->newFh('-format' => 'EMBL');
$out = Bio::SeqIO->newFh('-format' => 'fasta');
print "First sequence in fasta format... \n";
print $out $seqobj;


The code starts with a call to the new object constructor of the
Bio::SeqIO class. The new method is being passed
the pathname to a FASTA file in $infile, and told
that the format is FASTA.


A quick look at the Bio::SeqIO documentation
explains that the call to Bio::SeqIO->new
returns a stream object for the specified format. So,
$out is a stream object (a stream is input or
output of data) for FASTA-formatted data, and $in
is a stream object for FASTA-formatted input from the file named in
the $infile variable. These $in
and $out objects are also filehandles.


After the $in object is initialized on the FASTA
file named in $infile, it calls the
next_seq method, which gets the next (in this
case, the first and perhaps only) FASTA record from the file, and it
creates a sequence object $seqobj. The output
$out object is created. The Perl
print statement is then called, using
$out as a filehandle, and printing
$seqobj. This prints the first FASTA record from
that file, as shown in the demo output in the last section and
repeated here:


First sequence in fasta format... 
>Test1
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC
TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG
TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC
ACAACATCCATGAAACGCATTAGCACCACC


The next part of the sequence_manipulation demo
code shows many of the methods for extracting information from a
sequence object:


# III.4 Manipulating individual sequences
# The following methods return strings
print "Seq object display id is ",
$seqobj->display_id( ), "\n"; # the human read-able id of the sequence
print "Sequence is ",
$seqobj->seq( )," \n"; # string of sequence
print "Sequence from 5 to 10 is ",
$seqobj->subseq(5,10)," \n"; # part of the sequence as a string
print "Acc num is ",
$seqobj->accession_number( ), " \n"; # when there, the accession number
print "Moltype is ",
$seqobj->alphabet( ), " \n"; # one of 'dna','rna','protein'
print "Primary id is ", $seqobj->primary_seq->primary_id( )," \n";
# a unique id for this sequence irregardless
#print "Primary id is ", $seqobj->primary_id( ), " \n";
# a unique id for this sequence irregardless
# of its display_id or accession number
# The following methods return an array of Bio::SeqFeature objects
$seqobj->top_SeqFeatures; # The 'top level' sequence features
$seqobj->all_SeqFeatures; # All sequence features, including sub
# seq features
# The following methods returns new sequence objects,
# but do not transfer features across
# truncation from 5 to 10 as new object
print "Truncated Seq object sequence is ",
$seqobj->trunc(5,10)->seq( ), " \n";
# reverse complements sequence
print "Reverse complemented sequence 5 to 10 is ",
$seqobj->trunc(5,10)->revcom->seq, " \n";
# translation of the sequence
print "Translated sequence 6 to 15 is ",
$seqobj->translate->subseq(6,15), " \n";


This section of the tutorial produced the following output (slightly
reformatted to fit the pages of this book):


Seq object display id is Test1
Sequence is AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGA
TAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACC
AATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
Sequence from 5 to 10 is TTTCAT
Acc num is unknown
Moltype is dna
Primary id is Test1
Truncated Seq object sequence is TTTCAT
Reverse complemented sequence 5 to 10 is ATGAAA
Translated sequence 6 to 15 is LQRAICLCVD


This part of the demo code is self-explanatory. As you can see, the
methods return:



The FASTA ID (the part immediately following the > sign in the
first line)



The sequence alone as a string (reformatted with line breaks to fit
this book)



The accession number if it is known (FASTA format
won't have this, but Genbank format will, for
instance)



The type of molecule (dna, rna, or protein)



A unique ID compared to other IDs in the running program but drawn
from the file itself if possible



The sequence of a new truncated sequence object defined from the
original sequence object; a reverse complement



The peptide resulting from a translation of a specified part of a
sequence




The final section of the sequence_manipulation
demo demonstrates the ability to translate nucleotides into all six
reading frames using alternate codon translation tables if desired:


my $c = shift;
$c ||= 'ctgagaaaataa';
print "\nBeginning 3-frame and alternate codon translation example... \n";
my $seq = new Bio::PrimarySeq('-SEQ' => $c, '-ID' => 'no.One');
print "$c translated using method defaults : ",
$seq->translate->seq, "\n";
# Bio::Seq uses same sequence methods as PrimarySeq
my $seq2 = new Bio::Seq('-SEQ' => $c, '-ID' => 'no.Two');
print "$c translated as a coding region (CDS): ",
$seq2->translate(undef, undef, undef, undef, 1)->seq, "\n";
print "\nTranslating in all six frames:\n";
my @frames = (0, 1, 2);
foreach my $frame (@frames) {
print " frame: ", $frame, " forward: ",
$seq->translate(undef, undef, $frame)->seq, "\n";
print " frame: ", $frame, " reverse-complement: ",
$seq->revcom->translate(undef, undef, $frame)->seq, "\n";
}
print "Translating with all codon tables using method defaults:\n";
my @codontables = qw( 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 );
foreach my $ct (@codontables) {
print $ct, " : ",
$seq->translate(undef, undef, undef, $ct)->seq, "\n";
}


This produces the output:


Beginning 3-frame and alternate codon translation example... 
ctgagaaaataa translated using method defaults : LRK*
ctgagaaaataa translated as a coding region (CDS): MRK
Translating in all six frames:
frame: 0 forward: LRK*
frame: 0 reverse-complement: LFSQ
frame: 1 forward: *EN
frame: 1 reverse-complement: YFL
frame: 2 forward: EKI
frame: 2 reverse-complement: IFS
Translating with all codon tables using method defaults:
1 : LRK*
2 : L*K*
3 : TRK*
4 : LRK*
5 : LSK*
6 : LRKQ
9 : LSN*
10 : LRK*
11 : LRK*
12 : SRK*
13 : LGK*
14 : LSNY
15 : LRK*
16 : LRK*
21 : LSN*


Again, this code is fairly easy to follow, although you may find
yourself turning to the documentation for some of the finer points
when you try to use these objects and methods in your own code.


This part of the code starts by getting an argument of sequence data
if one is available to be shifted off of the @_
argument array and into the variable $c;
otherwise, it sets the variable $c to a preset
sequence:


my $c = shift;
$c ||= 'ctgagaaaataa';


Briefly, the methods that follow in this part of the
sequence_manipulation demo do the following:



Call the PrimarySeq::new constructor to create a
lightweight sequence object from $c that
doesn't have the extra annotation often present in a
standard sequence file (see perldoc
Bio::PrimarySeq for the whole story)



Translate the sequence (from $c)



Call the Bio::Seq constructor to create the
$seq2 sequence object (from the same sequence data
in $seq)



Translate the CDS in the sequence object $seq2



Translate the sequence in all six reading frames



Translate the sequence using all 21 defined codon tables




The translate method seems to have lots of
interesting options. However, if you try to look up the documentation
for this method, you may have difficulty finding it. The problem is
that Bioperl classes often make considerable use of inheritance. Say
that the code is calling a method on a
Bio::PrimarySeq object, as do the following lines
from the demo (somewhat separated in the original):


my $seq = new Bio::PrimarySeq('-SEQ' => $c, '-ID' => 'no.One');
$seq->translate(undef, undef, $frame)->seq, "\n";


You may try perldoc
Bio::PrimarySeq and not find a discussion of the
translate method because it is being inherited
from some other class. And since multiple inheritance is possible, it
can take considerable effort to track down this method documentation
by figuring out the parent classes and searching the documentation.


There's a very convenient way to find the
documentation for a method, even if it's only
inheritedit's built into the
bptutorial.pl script. In the example just
mentioned, you ask for the list of methods used by the
Bio::PrimarySeq method.


[tisdall]$ perl bptutorial.pl 100 Bio::PrimarySeq
***Methods for Object Bio::PrimarySeq ********
Methods taken from package Bio::DescribableI
description display_name
Methods taken from package Bio::IdentifiableI
authority lsid_string namespace namespace_string object_id version
Methods taken from package Bio::PrimarySeq
accession direct_seq_set validate_seq
Methods taken from package Bio::PrimarySeqI
accession_number alphabet can_call_new desc display_id id
is_circular length moltype primary_id revcom seq
subseq translate trunc
Methods taken from package Bio::Root::Root
DESTROY debug verbose
Methods taken from package Bio::Root::RootI
carp confess deprecated new stack_trace stack_trace_dump
throw throw_not_implemented warn warn_not_implemented
[tisdall]$


Sure enough, there under the heading "Methods taken
from package Bio::PrimarySeqI" appears the method
translate. You should then look at the
Bio::PrimarySeqI documentation and find the
following method description:


         translate
Title : translate
Usage : $protein_seq_obj = $dna_seq_obj->translate
#if full CDS expected:
$protein_seq_obj = $cds_seq_obj->translate(undef,undef,undef,undef,1);
Function:
Provides the translation of the DNA sequence using full
IUPAC ambiguities in DNA/RNA and amino acid codes.
The full CDS translation is identical to EMBL/TREMBL
database translation. Note that the trailing terminator
character is removed before returning the translation
object.
Note: if you set $dna_seq_obj->verbose(1) you will get a
warning if the first codon is not a valid initiator.
Returns : A Bio::PrimarySeqI implementing object
Args : character for terminator (optional) defaults to '*'
character for unknown amino acid (optional) defaults to 'X'
frame (optional) valid values 0, 1, 2, defaults to 0
codon table id (optional) defaults to 1
complete coding sequence expected, defaults to 0 (false)
boolean, throw exception if not complete CDS (true) or
defaults to warning (false)


You will want to explore at least a few more of the tutorials
embedded in bptutorial.pl, as
we've done here with the first of the
tutorials.



/ 156