Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

3.7 Gene2.pm: A Second Example of a Perl Class

Gene1
demonstrated the fundamentals of a Perl class. Now,
I'll build a more realistic example, which also
includes a few additional standard Perl techniques.

My goal is to present an example that you can imitate in order to
begin to develop your own OO software. I'm going to
build the example in three more stages, expanding upon the
Gene1.pm module. First, I'll add
mutators,
which are methods that alter the data in an object.
I'll also add a method that gives information about
the class as a whole, returning the count of how many objects in the
class exist in the running program. This depends on the use of
closures,
methods that use variables declared outside the methods. This is the
new material in the Gene2.pm module.

After that step, I introduce the AUTOLOAD
mechanism, which gives a single class method called
AUTOLOAD that can define large numbers of other
methods and significantly reduce the amount of coding you need to
write to develop a more complex object (among other benefits to be
described later). That will be the Gene3.pm
module.

We'll end up with a Gene.pm
module you can use as a basis for your own Perl module development.
It will add a mechanism to specify what properties each attribute has
(which can prevent improper data manipulation, for instance). It will
show how to initialize an object with class defaults and how to clone
an existing object. Finally, Gene.pm will show you
how to incorporate the documentation for a class right in the Perl
code for the class.

Here is the code for the intermediate Gene2.pm
module. Following the Gene2.pm module is an
example of the code and output of a small test program that drives
the module. Take a minute to look at these two code examples,
especially at the comments. The module Gene2.pm
contains several new details that will be discussed following the
code. The test program should be fairly easy to read and understand.

package Gene2;
#
# A second version of the Gene.pm module
#
use strict;
use warnings;
use Carp;
# Class data and methods, that refer to the collection of all objects
# in the class, not just one specific object
{
my $_count = 0;
sub get_count {
$_count;
}
sub _incr_count {
++$_count;
}
sub _decr_count {
--$_count;
}
}
# The constructor for the class
sub new {
my ($class, %arg) = @_;
my $self = bless {
_name        => $arg{name}      || croak("Error: no name"),
_organism    => $arg{organism}  || croak("Error: no organism"),
_chromosome  => $arg{chromosome}|| "????",
_pdbref      => $arg{pdbref}    || "????",
}, $class;
$class->_incr_count(  );
return $self;
}
# Accessors, for reading the values of data in an object
sub get_name        { $_[0] -> {_name}       }
sub get_organism    { $_[0] -> {_organism}   }
sub get_chromosome  { $_[0] -> {_chromosome} }
sub get_pdbref      { $_[0] -> {_pdbref}     }
# Mutators, for writing the values of object data
sub set_name {
my ($self, $name) = @_;
$self -> {_name} = $name if $name;
}
sub set_organism {
my ($self, $organism) = @_;
$self -> {_organism} = $organism if $organism;
}
sub set_chromosome {
my ($self, $chromosome) = @_;
$self -> {_chromosome} = $chromosome if $chromosome;
}
sub set_pdbref {
my ($self, $pdbref) = @_;
$self -> {_pdbref} = $pdbref if $pdbref;
}
1;

Here is the small test program testGene2 that
demonstrates how to use the objects and methods in this version
Gene2 of our OO class:

#!/usr/bin/perl
#
# Test the second version of the Gene module
#
use strict;
use warnings;
# Change this line to show the folder where you store Gene2.pm
use lib "/home/tisdall/MasteringPerlBio/development/lib";
use Gene2;
#
# Create object, print values
#
print "Object 1:\n\n";
my $obj1 = Gene2->new(
name          => "Aging",
organism      => "Homo sapiens",
chromosome    => "23",
pdbref        => "pdb9999.ent"
); 
print $obj1->get_name, "\n";
print $obj1->get_organism, "\n";
print $obj1->get_chromosome, "\n";
print $obj1->get_pdbref, "\n";
#
# Create another object, print values ... some will be unset
#
print "\n\nObject 2:\n\n";
my $obj2 = Gene2->new(
organism    => "Homo sapiens",
name        => "Aging",
); 
print $obj2->get_name, "\n";
print $obj2->get_organism, "\n";
print $obj2->get_chromosome, "\n";
print $obj2->get_pdbref, "\n";
#
# Reset some of the values, print them
#
$obj2->set_name("RapidAging");
$obj2->set_chromosome("22q");
$obj2->set_pdbref("pdf9876.ref");
print "\n\n";
print $obj2->get_name, "\n";
print $obj2->get_organism, "\n";
print $obj2->get_chromosome, "\n";
print $obj2->get_pdbref, "\n";
print "\nCount is ", Gene2->get_count, "\n\n";
#
# Create another object, print values: but this fails
# because the "name" value is required (see the "new"
# constructor in Gene2.pm)
#
print "\n\nObject 3:\n\n";
my $obj3 = Gene2->new(
organism      => "Homo sapiens",
chromosome    => "23",
pdbref        => "pdb9999.ent"
); 
print "\nCount is ", Gene2->get_count, "\n\n";

Finally, here's the output from the test program
testGene2:

Object 1:
Aging
Homo sapiens
23
pdb9999.ent
Object 2:
Aging
Homo sapiens
????
????
RapidAging
Homo sapiens
22q
pdf9876.ref
Count is 2
Object 3:
Error: no name at testGene2 line 68

It's a good idea to take a moment to read through
this Gene2.pm module, the test program
testGene2, and the output. Compare this new
Gene2 module with the earlier
Gene1 module. In particular, notice where the
methods are defined in the module, and then how they are actually
used in the test program. Don't get hung up on the
details in this first reading; just look at the overall picture.
Notice that the definitions are all in the module
Gene2.pm, which is then loaded at the beginning of
the test program testGene2; it is
testGene2 that actually creates the
module's objects and uses the
module's methods on those objects. In other words,
testGene2 is a program;
Gene2.pm is a definition of a class that is used
in testGene2.

Let's begin examining the module code.

3.7.1 Closures

A closure keeps
track of class data. Class
data
refers not to a particular object, but to several, possibly all,
objects of a class that have been created during the running of your
program. This is frequently important to do. For instance, say you
have a DNA sequencing pipeline that can handle only 20 sequences at
any one time. You'd want your controlling program to
block any attempt to create more than 20 sequence objects until the
pipeline is ready to receive more. To do this, you would keep a count
of how many sequence objects your controlling program has created.
Closures are a way to program such class data.

A closure is a subroutine that uses a variable
defined outside the subroutine. By surrounding such a variable and
some closures that use that variable within a block, you can use the
closures to access the variable from anywhere in the program, and the
variable will never go out of scope and lose its value. This section
will explain how this works and how to use it in your code.

The following code is new in Gene2.pm:

# Class data and methods, that refer to the collection of all objects
# in the class, not just one specific object
{
my $_count = 0;
sub get_count {
$_count;
}
sub _incr_count {
++$_count;
}
sub _decr_count {
--$_count;
}
}

This code creates a variable $_count.
$_count is a lexical my
variable in a block of curly braces, and therefore is hidden from all
parts of the code except within the block. The three methods that are
also defined in the same block use the variable
$_count.This variable persists throughout the life
of the program because the subroutines defined with it are closures.
For example, in the code for the class module
Gene2.pm, I use $_count to keep
a count of how many objects are in existence at any given time.
Notice that the method names _incr_count and
_decr_count begin with a leading underscore, as
does the variable name $_count. They
aren't meant to be called by the user of the class
but are internal to the module. On the other hand, the remaining
method get_count doesn't begin
with a leading underscore and is meant to be called whenever the user
of the class wants to know what the count is.

The previous section of code implements a closure. It is surrounded
by curly braces creating a Perl block.
You've seen many blocks associated with loops and
conditionals as you learned the fundamentals of Perl. The block here
stands on its own without being a part of another programming
construct.

Any block, this one included, creates a new
scope for the variables that occur within it.
my variables (also called
lexical variables) within a
block exist only while the program is executing the statements within
that block. When a program leaves a block by passing beyond its
closing curly brace, the my variables within it go
out of scope. In other words, they cease to exist, and disappear from
the program until the program reenters the block, and they are
created anew.

The preceding paragraph is correct; however, there is one important
"but."

Subroutine definitions don't go out of scope in the
way that lexically scoped (my) variables do. It is
also possible for a subroutine definition to affect the behavior of a
lexically scoped variable. Aha. Read on.

To repeat: subroutine definitions aren't subject to
the same constraints as variables in regards to my
and blocks. In fact, a subroutine definition is global to the entire
package in which it's declared. Perl looks for
subroutine definitions at compile-time, before actually running the
program, and makes a subroutine definition available to an entire
package no matter where the subroutine is declaredeven if
it's declared in a conditional block
that's never reached during runtimewhen the
program code is actually executed.

As an example, here is a small program with a subroutine definition:

#
# A program to demonstrate the global nature of subroutine definitions
#
my $dna = 'ACGT';
if ($dna eq 'ACGT') {
print "This statement gets executed\n";
print "Here's the subroutine call:\n";
isdna($dna);
} else {
print "This statement does not get executed\n";
#
# The following subroutine definition is in a block which is
# never executed at runtime.
#
sub isdna {
# Print the argument if it is DNA
if($_[0] =~ /^[ACGT]+$/i) {
print $_[0], "\n";
else {
return 0;
}
}
}

This produces the following output:

This statement gets executed
Here's the subroutine call:
ACGT

As you see, even though the subroutine definition is buried in a
block that's never entered, not even once, it is
still available to the program. Perl scans the program at
compile-time, reads in any subroutine definition no matter where it
is, and the subroutine definition is then available to be called from
anywhere in the program at runtime.

Continuing on, in the code from Gene2.pm under
consideration, there's the variable definition:

my $_count = 0;

which occurs outside the following subroutine definitions such as:

sub _incr_count {
++$_count;
}

The variable $_count is declared outside the
subroutine _incr_count, but the subroutine uses
the variable. Therefore, by definition, the subroutine
_incr_count is a closure.

There's just one more piece to the puzzle. Consider
again the code fragment from Gene2.pm, which I
repeat here:

# Class data and methods, that refer to the collection of all objects
# in the class, not just one specific object
{
my $_count = 0;
sub get_count {
$_count;
}
sub _incr_count {
++$_count;
}
sub _decr_count {
--$_count;
}
}

It seems that when the program leaves the block that encloses this
code, the variable $_count should go out of scope
and no longer be available to the program. However, in
Gene2.pm the $_count variable
doesn't cease to exist.

Because the subroutine definitions in this block are global, and
because they also reference the variable $_count,
Perl knows that at any point in the program you can put in a call to,
say, get_count, which in turn needs the variable
$_count to execute. Perl doesn't
cause the variable $_count to cease to exist
because it sees the closures and avoids destroying the variable they
reference at runtime. At any point in the program, the value of
$_count can be obtained by calling the subroutine.
However, the value of $_count
can't be accessed in any other way than by
get_count or other closure defined within the same
block.

To summarize, by defining a variable and a closure that uses that
variable within a block, a program can limit access to that variable
to calls by the closures. This is exactly what I want to do in
setting up class methods that refer to the collection of all objects
that are in use.

In Gene2.pm, I want to initialize the count of
objects to 0 when the program starts and then increment it by one
each time a new object is created. By defining
_incr_count as a closure, I can call it from
within the new object constructor, ensuring that
the variable $_count will keep an accurate count
of the number of objects that are created.

3.7.2 Tracking Class Data from the Constructor Method

In this
second version of the class, I just have to make a small change to
the constructor method, the subroutine new.

Here is the modified new method
constructor:

# The constructor for the class
sub new {
my ($class, %arg) = @_;
my $self = bless {
_name        => $arg{name}      || croak("Error: no name"),
_organism    => $arg{organism}  || croak("Error: no organism"),
_chromosome  => $arg{chromosome}|| "????",
_pdbref      => $arg{pdbref}    || "????",
}, $class;
$class->_incr_count(  );
return $self;
}

First, I create the object by blessing (and
initializing) an anonymous hash, as before. This time, however,
I'll save the object as the local variable
$self. This allows me to add a call to the class
method _incr_count in order to keep track of the
total number of objects created. I'll then return
the object $self from the subroutine.

3.7.3 Accessor and Mutator Methods

In the
first version of Gene1.pm, I printed the values
stored in an object by accessing simple methods such as
get_name.

In this new version of Gene2.pm, I have the same
specific methods for each attribute for which I may want to see the
value. I also include
mutators,
which are subroutines that enable the user of the class to alter the
values of attributes of an object.

Here are the accessor and mutator methods for
Gene2.pm:

# Accessors, for reading the values of data in an object
sub get_name        { $_[0] -> {_name}       }
sub get_organism    { $_[0] -> {_organism}   }
sub get_chromosome  { $_[0] -> {_chromosome} }
sub get_pdbref      { $_[0] -> {_pdbref}     }
# Mutators, for writing the values of object data
sub set_name {
my ($self, $name) = @_;
$self -> {_name} = $name if $name;
}
sub set_organism {
my ($self, $organism) = @_;
$self -> {_organism} = $organism if $organism;
}
sub set_chromosome {
my ($self, $chromosome) = @_;
$self -> {_chromosome} = $chromosome if $chromosome;
}
sub set_pdbref {
my ($self, $pdbref) = @_;
$self -> {_pdbref} = $pdbref if $pdbref;
}

The mutators collect two arguments. The first is the reference to the
object, which as before, is passed automatically to the method when
it is invoked (using the method set_name as an
example):

$obj->set_name('hairy');

The second argument collected is then the first argument given to the
call, in this case, setting the gene name to
hairy.

The work of the subroutine is accomplished by the line:

$self -> {_name} = $name if $name;

It simply sets the internal _name attribute to the
supplied name (hairy in this example) if the
argument $name is supplied. If
it's not supplied, the subroutine does nothing.

Again, you see that the internal representation of the attributes of
the object are hidden from the class's user.
Altering an object's attributes is done with
methods; the class author is then free to alter the way in which the
attributes are stored, without changing the Application Programming
Interface (API), the interface of the class to the outside world. If
you use this class, you don't have to change your
code when a new version of the class is written.

The test program testGene2 is similar to
testGene1, with the addition of examples of the
class mutators.

Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی