Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

5.3 Restriction.pm: Finding Recognition Sites

The
time has now come to write the class
that creates an object out of a sequence, enzyme name(s), and the map
of the location(s) of the enzyme recognition sites in the sequence.

This module depends a great deal on the module
Rebase developed in the previous section, but it
is fairly short because it just tries to do a small job. This new
Restriction class takes a
Rebase object (which has the Rebase database
translated to regular expressions), some sequence, and a list of
enzymes, and uses the regular expressions to find the recognition
sites in the sequence. (Note that it doesn't use
inheritance; it simply creates a Rebase object to
use.)

In this module, the restriction map (the list of locations where the
enzymes have recognition sites in the sequence) is obtained through
an auxiliary method map_enzyme that simply lists
the locations. Clearly, a more graphical display would be easier and
more useful. I'll consider that as the book
progresses.

5.3.1 The Restriction.pm Module

Here is the Restriction.pm module:

package Restriction;
#
# A class to find locations of restriction enzyme recognition sites in
#  DNA sequence data.
#
use strict;
use warnings;
use Carp;
# Class data and methods
{
# A list of all attributes with default values.
# "enzyme" is given as an argument possibly multiple time, set as key to _map hash
my %_attributes = (
_rebase      => { },  # A Rebase.pm hash-based object
# key   = restriction enzyme name
# value = space-separated string of recognition sites => regular expressions
_sequence    => '', # DNA sequence data in raw format (only bases)
_map         => { },# a hash: keys are enzyme names, 
# values are arrays of locations 
_enzyme      => '', # space- or comma-separated enzyme names, 
# set as key to _map hash
);
# Global variable to keep count of existing objects
my $_count = 0;
# Return a list of all attributes
sub _all_attributes {
keys %_attributes;
}
# Manage the count of existing objects
sub get_count {
$_count;
}
sub _incr_count {
++$_count;
}
sub _decr_count {
--$_count;
}
}

This opening block shows that Restriction is also
a class with a small and simple set of attributes. One is just the
DNA sequence data, _sequence.

The attribute _map is a hash that stores the
computed restriction map with a key for each restriction enzyme and a
value consisting of an array of the positions in the sequence where
recognition sites for that enzyme occur.

The attribute _enzyme is one or more enzyme names
separated by spaces or commas.

The remaining attribute _rebase is an object of
the class I developed in the last section, the class
Rebase. Recall that a Rebase
object gives you the ability to get regular expressions for
recognition sites of restriction enzymes. When you call
Restriction->new you must include as one of its
arguments a Rebase object, as is shown in the
example given in the POD documentation later in this section.

Since a program can have many restriction maps, I also maintain the
count of Restriction objects.

5.3.1.1 Initializing Restriction objects

Next, the new
constructor method creates and initializes
Restriction objects:

# The "new" constructor method, called from class, e.g.
sub new {
my ($class, %args) = @_;
# Create a new object
my $self = bless {  }, $class;
# Set the attributes for the provided arguments
foreach my $attribute ($self->_all_attributes(  )) {
# E.g. attribute = "_name",  argument = "name"
my($argument) = ($attribute =~ /^_(.*)/);
if (exists $args{$argument}) {
if($argument eq 'enzyme') {
# permit space or comma separated enzyme names
$args{$argument} =~ s/,/ /g;
}
$self->{$attribute} = $args{$argument};
}
}
# Check that the correct arguments are given
if( not defined $self->{_rebase} ) {
croak "A Rebase object must be given as an argument";
}elsif( ref($self->{_rebase}) ne 'Rebase' ) {
croak "The argument to rebase is not a Rebase object";
}elsif( not defined $self->{_sequence} ) {
croak "A sequence must be given as an argument";
}
# Calculate the locations for each enzyme, store in _map hash attribute
foreach my $enzyme (split(" ", $self->{_enzyme})) {
$self->map_enzyme($enzyme);
}
$self->_incr_count;
return $self;
}
# For this simple class I have no AUTOLOAD or DESTROY
# No get_rebase method, I don't want to pass around a huge hash
# No set mutators: all initialization done by way of "new" constructor
# No clone method.  Each sequence and set of enzymes can be easily calculated
#  by means of a "new" command.

5.3.1.2 The methods explained

The new constructor method checks that the proper
and required arguments are provided: a sequence, enzyme(s), and a
Rebase object.

Then the new constructor performs the required
computation by calling the method map_enzyme for
each enzyme to determine the (possibly empty) array of locations in
which the enzyme can cut the sequence. These enzyme-array key/value
pairs are stored in the _map attribute.

As you can see in the next segment of code from
Restriction.pm, the method
map_enzyme works by getting the regular
expressions that have been calculated for the enzyme and then finding
the positions where the regular expressions match the sequence.

The get_regular_expressions method accesses the
Rebase object that was passed to the
Restriction->new constructor method and appears
as the $self->{_rebase} attribute. In that
object, the attribute _rebase is the hash of the
database, which looks for the value of the key
$enzyme.

The match_positions method puts a match of a
regular expression in a while loop, where it loops
once for each location it finds. The offset of the matching sequence
is available in the special variable $-[0] after a
successful regular-expression match (see the
perlvar section of the Perl documentation for more
details).

sub map_enzyme {
my($self, $enzyme) = @_;
my(@positions) = (  );
my(@res) = $self->get_regular_expressions($enzyme);
foreach my $re (@res) {
push @positions, $self->match_positions($re);
}
@{$self->{_map}{$enzyme}} = @positions;
return @positions;
}
sub get_regular_expressions {
my($self, $enzyme) = @_;
my(%sites) = split(' ', $self->{_rebase}{_rebase}{$enzyme});
# May have duplicate values
return values %sites;
}
# Find positions of a regular expression in the sequence
sub match_positions {
my($self, $regexp) = @_;
my @positions = (  );
# Determine positions of regular expression matches
while ( $self->{_sequence} =~ /$regexp/ig ) {
push @positions, ($-[0] + 1 );
}
return(@positions);
}

Finally, here are some helper methods that report on the attributes
of sequence and of the calculated map for a given enzyme (or the
entire hash that contains the map of all the enzymes). I
didn't use AUTOLOAD here because
there are only a handful of attributes that return a few different
data types: array, scalar string, and hash.

sub get_enzyme_map {
my($self, $enzyme) = @_;
@{$self->{_map}{$enzyme}};
}
sub get_enzyme_names {
my($self) = @_;
keys %{$self->{_map}};
}
sub get_sequence {
my($self) = @_;
$self->{_sequence};
}
sub get_map {
my($self) = @_;
%{$self->{_map}};
}

5.3.1.3 Documentation

Here's
the (bare bones) documentation for the class embedded right in the
module using the POD documentation language:

=head1 Restriction
Restriction: Given a Rebase object, sequence, and list of restriction enzyme
names, return the locations of the recognition sites in the sequence
=head1 Synopsis
use Restriction;
use Rebase;
use strict;
use warnings;
my $rebase = Rebase->new(
dbmfile => 'BIONET',
bionetfile => 'bionet.212'
);
my $restrict = Restriction->new(
rebase => $rebase,
enzyme => 'EcoRI, HindIII',
sequence => 'ACGAATTCCGGAATTCG',
);
print "Locations for EcoRI are ", join(' ', 
$restrict->get_enzyme_map('EcoRI')), "\n";
=head1 AUTHOR
James Tisdall
=head1 COPYRIGHT
Copyright (c) 2003, James Tisdall
=cut
1;

Saving the Synopsis example in a file and running
it (making sure the required bionet.212 file is in
place) gives the output:

EcoRI data in Rebase is GAATTC GAATTC
Sequence is ACGAATTCCGGAATTCG
Locations for EcoRI are 3 11

Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی