Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Mastering Perl for Bioinformatics [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید












5.3 Restriction.pm: Finding Recognition Sites



The
time has now come to write the class
that creates an object out of a sequence, enzyme name(s), and the map
of the location(s) of the enzyme recognition sites in the sequence.


This module depends a great deal on the module
Rebase developed in the previous section, but it
is fairly short because it just tries to do a small job. This new
Restriction class takes a
Rebase object (which has the Rebase database
translated to regular expressions), some sequence, and a list of
enzymes, and uses the regular expressions to find the recognition
sites in the sequence. (Note that it doesn't use
inheritance; it simply creates a Rebase object to
use.)


In this module, the restriction map (the list of locations where the
enzymes have recognition sites in the sequence) is obtained through
an auxiliary method map_enzyme that simply lists
the locations. Clearly, a more graphical display would be easier and
more useful. I'll consider that as the book
progresses.



5.3.1 The Restriction.pm Module



Here is the Restriction.pm module:


package Restriction;
#
# A class to find locations of restriction enzyme recognition sites in
# DNA sequence data.
#
use strict;
use warnings;
use Carp;
# Class data and methods
{
# A list of all attributes with default values.
# "enzyme" is given as an argument possibly multiple time, set as key to _map hash
my %_attributes = (
_rebase => { }, # A Rebase.pm hash-based object
# key = restriction enzyme name
# value = space-separated string of recognition sites => regular expressions
_sequence => '', # DNA sequence data in raw format (only bases)
_map => { },# a hash: keys are enzyme names,
# values are arrays of locations
_enzyme => '', # space- or comma-separated enzyme names,
# set as key to _map hash
);
# Global variable to keep count of existing objects
my $_count = 0;
# Return a list of all attributes
sub _all_attributes {
keys %_attributes;
}
# Manage the count of existing objects
sub get_count {
$_count;
}
sub _incr_count {
++$_count;
}
sub _decr_count {
--$_count;
}
}


This opening block shows that Restriction is also
a class with a small and simple set of attributes. One is just the
DNA sequence data, _sequence.


The attribute _map is a hash that stores the
computed restriction map with a key for each restriction enzyme and a
value consisting of an array of the positions in the sequence where
recognition sites for that enzyme occur.


The attribute _enzyme is one or more enzyme names
separated by spaces or commas.


The remaining attribute _rebase is an object of
the class I developed in the last section, the class
Rebase. Recall that a Rebase
object gives you the ability to get regular expressions for
recognition sites of restriction enzymes. When you call
Restriction->new you must include as one of its
arguments a Rebase object, as is shown in the
example given in the POD documentation later in this section.


Since a program can have many restriction maps, I also maintain the
count of Restriction objects.



5.3.1.1 Initializing Restriction objects



Next, the new
constructor method creates and initializes
Restriction objects:


# The "new" constructor method, called from class, e.g.
sub new {
my ($class, %args) = @_;
# Create a new object
my $self = bless { }, $class;
# Set the attributes for the provided arguments
foreach my $attribute ($self->_all_attributes( )) {
# E.g. attribute = "_name", argument = "name"
my($argument) = ($attribute =~ /^_(.*)/);
if (exists $args{$argument}) {
if($argument eq 'enzyme') {
# permit space or comma separated enzyme names
$args{$argument} =~ s/,/ /g;
}
$self->{$attribute} = $args{$argument};
}
}
# Check that the correct arguments are given
if( not defined $self->{_rebase} ) {
croak "A Rebase object must be given as an argument";
}elsif( ref($self->{_rebase}) ne 'Rebase' ) {
croak "The argument to rebase is not a Rebase object";
}elsif( not defined $self->{_sequence} ) {
croak "A sequence must be given as an argument";
}
# Calculate the locations for each enzyme, store in _map hash attribute
foreach my $enzyme (split(" ", $self->{_enzyme})) {
$self->map_enzyme($enzyme);
}
$self->_incr_count;
return $self;
}
# For this simple class I have no AUTOLOAD or DESTROY
# No get_rebase method, I don't want to pass around a huge hash
# No set mutators: all initialization done by way of "new" constructor
# No clone method. Each sequence and set of enzymes can be easily calculated
# by means of a "new" command.


5.3.1.2 The methods explained



The new constructor method checks that the proper
and required arguments are provided: a sequence, enzyme(s), and a
Rebase object.


Then the new constructor performs the required
computation by calling the method map_enzyme for
each enzyme to determine the (possibly empty) array of locations in
which the enzyme can cut the sequence. These enzyme-array key/value
pairs are stored in the _map attribute.


As you can see in the next segment of code from
Restriction.pm, the method
map_enzyme works by getting the regular
expressions that have been calculated for the enzyme and then finding
the positions where the regular expressions match the sequence.


The get_regular_expressions method accesses the
Rebase object that was passed to the
Restriction->new constructor method and appears
as the $self->{_rebase} attribute. In that
object, the attribute _rebase is the hash of the
database, which looks for the value of the key
$enzyme.


The match_positions method puts a match of a
regular expression in a while loop, where it loops
once for each location it finds. The offset of the matching sequence
is available in the special variable $-[0] after a
successful regular-expression match (see the
perlvar section of the Perl documentation for more
details).


sub map_enzyme {
my($self, $enzyme) = @_;
my(@positions) = ( );
my(@res) = $self->get_regular_expressions($enzyme);
foreach my $re (@res) {
push @positions, $self->match_positions($re);
}
@{$self->{_map}{$enzyme}} = @positions;
return @positions;
}
sub get_regular_expressions {
my($self, $enzyme) = @_;
my(%sites) = split(' ', $self->{_rebase}{_rebase}{$enzyme});
# May have duplicate values
return values %sites;
}
# Find positions of a regular expression in the sequence
sub match_positions {
my($self, $regexp) = @_;
my @positions = ( );
# Determine positions of regular expression matches
while ( $self->{_sequence} =~ /$regexp/ig ) {
push @positions, ($-[0] + 1 );
}
return(@positions);
}


Finally, here are some helper methods that report on the attributes
of sequence and of the calculated map for a given enzyme (or the
entire hash that contains the map of all the enzymes). I
didn't use AUTOLOAD here because
there are only a handful of attributes that return a few different
data types: array, scalar string, and hash.


sub get_enzyme_map {
my($self, $enzyme) = @_;
@{$self->{_map}{$enzyme}};
}
sub get_enzyme_names {
my($self) = @_;
keys %{$self->{_map}};
}
sub get_sequence {
my($self) = @_;
$self->{_sequence};
}
sub get_map {
my($self) = @_;
%{$self->{_map}};
}


5.3.1.3 Documentation



Here's
the (bare bones) documentation for the class embedded right in the
module using the POD documentation language:


=head1 Restriction
Restriction: Given a Rebase object, sequence, and list of restriction enzyme
names, return the locations of the recognition sites in the sequence
=head1 Synopsis
use Restriction;
use Rebase;
use strict;
use warnings;
my $rebase = Rebase->new(
dbmfile => 'BIONET',
bionetfile => 'bionet.212'
);
my $restrict = Restriction->new(
rebase => $rebase,
enzyme => 'EcoRI, HindIII',
sequence => 'ACGAATTCCGGAATTCG',
);
print "Locations for EcoRI are ", join(' ',
$restrict->get_enzyme_map('EcoRI')), "\n";
=head1 AUTHOR
James Tisdall
=head1 COPYRIGHT
Copyright (c) 2003, James Tisdall
=cut
1;


Saving the Synopsis example in a file and running
it (making sure the required bionet.212 file is in
place) gives the output:


EcoRI data in Rebase is GAATTC GAATTC
Sequence is ACGAATTCCGGAATTCG
Locations for EcoRI are 3 11


/ 156