Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

3.6 Details of the Gene1 Class

In this section, I introduce the OO features used to make a class in
Perl. First however, I explain the variable naming convention I use,
as well as the handy Carp module.

3.6.1 Variable Names and Conventions

Using an underscore in front of a name is a programming convention
that usually indicates that the item in question (e.g., a variable or
hash key) isn't meant for the outside world but only
for internal use.

This is just a convention; Perl doesn't require you
to do it. It will, however, make your code easier to read and
understand.

I generally follow this convention and put underscores in front of
names that I don't want directly accessed by the
programmer using the class. (In Perl, unlike some more strict OO
languages, you can access data that's internal to a
class, which make this naming convention that distinguishes internal
variables particularly useful.)

Thus, in my Gene1 class, the attributes
_name, _organism,
_chromosome, and _pdbref are
used internally only as the hash keys for the attributes in the
object. When you use the class, as I do in my example program
testGene1, you don't even have to
know these names exist.

The interface is through arguments that specify the initialization
values of these attributes. These arguments are called
name, organism,
chromosome, and pdbref. I also
have methodsthe subroutines also called name, organism,
chromosome, and pdbrefthat return the value of the actual
attributes stored in the object.

3.6.2 Carp and croak

The Carp module is called near the top of
Gene1.pm with use
Carp;.

Carp is a standard Perl module that provides
informative error messages in the case of problems.
carp prints a warning message;
croak prints an error message and dies. They
are very much like the Perl functions warn and
die; they report the line number in which the
problem occured in the error message and report from what subroutine
they were called. I use croak in my code; it
prints out the error message provided, names the file and the line
number and subroutine where it's called from, and
then kills the program.

This function is certainly useful during development because
it's another way to find errors in a program as
it's being created. It also gives program users the
ability to report the exact location of a problem, should one occur,
to the programming staff (which may be just one programmer, you!).

In my program output, the Carp message is:

no name at testGene1 line 35

It's produced by the line:

_name     => $arg{name}    || croak("no name"),

in the Gene1.pm module file. Line 35 of
testGene1 is the beginning line of this part of
the program:

my $obj3 = Gene1->new(
organism        => "Homo sapiens",
chromosome      => "23",
pdbref          => "pdb9999.ent"
);

It's the part of the code that tries to do something
bad: it's trying to initialize a new object without
setting its name. You'll see how this works in more
detail in the following sections.

3.6.3 The new Constructor Method

To create objects, I defined a special constructor
method called new. A call
to new returns a new object, properly initialized.
The new object is also marked as a member of the class, in this case
the class Gene1.

sub new {
my ($class, %arg) = @_;
return bless {
_name       => $arg{name}        || croak("no name"),
_organism   => $arg{organism}    || croak("no organism"),
_chromosome => $arg{chromosome}  || "????",
_pdbref     => $arg{pdbref}      || "????",
}, $class;
}

Note that each class may have its own requirements for creating a
class object, and so each class's constructor method
may be different than that for another class.[3] For instance, a
constructor may or may not provide default values for its attributes.
Still, there's a lot of similarity between the
constructor method of most classes.

[3] In
particular, a constructor method may have any name in Perl; you could
call it constructor,
OverTheSun, or anything that you choose. Most
programmers just use the very familiar name
new.

Let's dissect the code of the constructor
new. You'll see how
objects are marked as members of a class, and initialized, by their
constructor methods. Here are the main novelties:

The package name Gene1 is automatically passed to
the subroutine new as its first argument, even
though it isn't included in the argument list.

The returned hash reference is marked with the name
Gene1 (using the bless
function) thus making it an object in the Gene1
class.

Everything else here is straightforward Perl subroutine code.

Note that the call to new in the demonstration
program testGene1 is made as follows:

my $obj = Gene1->new( ... );

The scalar variable $obj is a reference that
points to the anonymous hash that's returned from
the new method. The object is a hash that contains
the attributes of the object, namely the
key/value pairs of the hash. As usual the
reference variable $obj is lexically scoped with
my. And, as you see, $obj is
marked with the class name
Gene1.

The call Gene1->new includes the name of the
package Gene1 in which the new
subroutine is defined. The package name is the class name; the name
of the module file in which the class is defined must be the package
name with .pm added. So you have a class
Gene1 in a module file Gene1.pm
that has the declaration package
Gene1;.

The call to new with its arguments is of the form:

Gene1->new( key1 => 'value1', key2 => 'value2', ... )

This call does two important things:

It calls the new subroutine in the
Gene1 package.

It passes the name Gene1 of the package to the
new subroutine as its first argument. Therefore,
in the new subroutine, in the line that collects
the arguments:

        my ($class, %arg) = @_;

the first argument is automatically the string
Gene1 and is assigned to the variable
$class.

This first argument Gene1 isn't
listed in the usual place in the parentheses after the subroutine
name in the call to the subroutine:

new( key1 => 'value1', key2 => 'value2', ... )

It happens automatically when the package name is used with an arrow
(->):

Gene1->new( key1 => 'value1', key2 => 'value2', ... )

This may seem a bit odd, but it has the desirable advantage of making
it unnecessary to type the class name Gene1 twice:
once to call the new method in the
Gene1 package, and again to pass the class name
Gene1 to the new method.
Instead of typing:

Gene1->new( "Gene1", key1 => 'value1', key2 => 'value2', ... )

you can just type:

Gene1->new( key1 => 'value1', key2 => 'value2', ... )

It's simply a bit of handy syntax the designers of
Perl added to save a bit of typing when writing OO code in Perl,
nothing more or less.

Now, let's examine the innards of the
new constructor method. The new
constructor method has the form:

sub new {
my ($class, %arg) = @_;
return bless {
...
}, $class;
}

First, notice that in addition to assigning the first argument, the
class name Gene1, to the variable
$class, the subroutine captures the rest of the
arguments in the hash variable %arg.

Recall, from your previous study of Perl, that initializing a hash by
assigning a list to it causes the items in the list to be treated as
key/value pairs in the hash. For example, if the arguments are:

('Myclass', mykey1 => 'myvalue1', mykey2 => 'myvalue2')

the scalar variable $class gets the value
Myclass, and the hash variable
%arg gets two key/value pairs initialized to the
key 'mykey1' with the value
'myvalue1', and the key
'mykey2' with the value
'myvalue2'. Also recall that => is a synonym
for a comma.[4]

[4] It also forces its left side to be
interpreted as a string and removes the need to surround the string
in quotes, which is exactly what I want here.

3.6.4 Creating an Object with bless

The
new constructor then returns the value of:

bless { ... }, $class;

The built-in Perl function bless does a very
simple thing, but it's enough to take a data
structure and make it an object in a class. It marks a reference with
a class (package) name.

In this code, bless takes two arguments. The
first, delimited by a pair of curly braces, is an anonymous hash,
which you'll recall is a reference to an unnamed
hash. This anonymous hash contains the data of the resulting object.
The second argument to bless is just the name of
the class, as it was saved in the $class scalar
variable.

This call to bless returns a hash that is
"marked" with the name of the
class. The hash that bless marks is then given to
the return function to serve as the returned value
of the new method.

The object reference that is returned can now be identified as an
object in the class Gene1. The object reference in
this example is marked with the name Gene1 and has
a hash as its top-level data structure. The new
method in the class creates a new object in the class.

Although the first argument to bless in this code
is an anonymous hash; in general, it can be any reference to a data
structure that serves as an object. It can be a reference to a
scalar, an array, a hash, or a more complex data structure. In the
example, I am just declaring an anonymous hash in place rather than
providing a reference to an existing hash. So, for example, if I
declare a hash and a reference to it like so:

%hash = ( key1 => 'value1', key2 => 'value2' );
$hashref = >
then I can bless the hash, mark it with the class
name HashClass, and save the resulting object:
$hashobj = bless $hashref, 'HashClass';

Alternatively, the same object $hashobj can be
created using an anonymous hash, and one call to
bless:
$hashobj = bless { key1 => 'value1', key2 => 'value2' }, 'HashClass';


3.6.5 Using ref to Report an Object's Class

The Perl function ref reports on the type of element
referred tovariable, object, code, etc. If the variable is
blessed, ref reports on the
class it is marked with.
After the call to new to create the
Gene1 object $obj, the line:
print ref $obj, "\n";

prints out as Gene1.
The Perl function ref returns
false if its argument isn't a
reference. If it is a reference, it returns one of the following:
SCALAR
ARRAY
HASH
CODE
REF
GLOB
LVALUE

If the reference has been blessed into a package,
that package name is returned from the call to
ref.

3.6.6 Initialize an Object with an Anonymous Hash

Here again is the complete definition of the new
method in the Gene1 class:
sub new {
my ($class, %arg) = @_;
return bless {
_name          => $arg{name}         || croak("no name"),
_organism      => $arg{organism}     || croak("no organism"),
_chromosome    => $arg{chromosome}   || "????",
_pdbref        => $arg{pdbref}       || "????",
}, $class;
}

The first argument to bless is the following
anonymous hash:
{
_name         => $arg{name}         || croak("no name"),
_organism     => $arg{organism}     || croak("no organism"),
_chromosome   => $arg{chromosome}   || "????",
_pdbref       => $arg{pdbref}       || "????",
}

As should be familiar (if not, see Appendix A for a
Perl refresher), the key/value pairs are separated by the
"syntactic sugar" symbol
=>. The
keys are in the first column; the variable names
_name, _organism,
_chromosome, and _pdbref are
used as the names of the keys.
The desired values are in the second column, following the
=> symbol. They are given in the form of a Perl
logical OR operator. The value has either been passed in, or the
default value is used:
value || default

The values are the values assigned from the argument list to the hash
%arg upon entry to the subroutine. If all these
arguments are passed to the new method, the hash
initializes its four keys (_name,
_organism, _chromosome, and
_pdbref with those values).
If chromosome or pdbref, is
passed to the new method, those values of
%arg aren't defined, and the
subroutine assigns the default value (the string
????) to the missing keys
(_chromosome, _pdbref, or
both).
If name or organism
aren't passed as arguments to the
new method, their values in
%arg aren't defined, and by
default, the subroutine calls croak and the
program exits with an error message.
Let's look closely at a line in the
Gene1.pm module that calls
croak:
_name            => $arg{name}            || croak("no name"),

This line is part of a hash initialization. It is initializing an
entry with a key _name. The value to be associated
with this key is given as:
$arg{name}        || croak("no name")

This sets the value of the key to the value
$arg{name} if that value exists. If
$arg{name} doesn't exist, the
value croak("no name") is evaluated. The behavior
of ||(the or Boolean operator)
is that the first argument is evaluated. The second argument is
evaluated only if the first argument evaluates to
false. In this code, the second argument kills the
program and prints an error message when it is evaluated. This is a
bit of a trick, but it's a common one
that's used in several programming languages that
have the Boolean or operator.
Now that you've seen how the new
constructor handles its arguments, let's look again
at how the test program testGene1 calls the
new method, which it does three times:
my $obj1 = Gene1->new(
name            => "Aging",
organism        => "Homo sapiens",
chromosome      => "23",
pdbref          => "pdb9999.ent"
); 
my $obj2 = Gene1->new(
organism        => "Homo sapiens",
name            => "Aging",
); 
my $obj3 = Gene1->new(
organism        => "Homo sapiens",
chromosome      => "23",
pdbref          => "pdb9999.ent"
);

The key/value pairs (the keys are the
attributes of the objects) are passed to the new
method. Notice that, due to the use of the %arg
hash to capture these arguments by new, the order
in which the arguments are passed isn't important.
This is a nice convenience when creating and initializing objects
because there are often many attributes and some may or may not be
initialized; being able to ignore the order of the arguments when you
call new makes it easier to program. Recall that
it's a general property of Perl hashes that the
order of the keys isn't important; it has to do with
how hashes are implemented, and why they're so fast
at retrieving values.
You'll recall that the use of
croak in the new method
requires the initialization of the name and
organism attributes. For instance,
$obj3 isn't created with an
initial value for the name attribute. The
new subroutine was defined to require such an
initial value, which makes sense because, at the least, I want every
gene in my program to have a name and an originating organism. The
output of the testGene1 program shows that this
third call to new triggers the
croak exit mechanism.

3.6.7 Accessor Methods

Accessor methods are subroutines in the
class that return the values of the class attributes. These
attributes are usually implemented as keys of the hash that serves as
the class object. You can access the attributes of an object, and
their values, directly; for example, given an object of the
Gene1 class, you can print out its name like so:
print $obj->{_name};

This gives the value of the key _name in the
anonymous hash pointed to by $obj. This works;
however, it's not good OO style. It directly
accesses the data in the object; good style requires you to access
the data through subroutines defined for that purpose. It is
preferable to restrict all access of an object's
attributes to the use of specific methods.
The actual attribute is called _name. This is
initialized from the value of the argument name in
the initialization of the arguments, as in this line from
new:
_name            => $arg{name}       || croak("no name"),

That was just a convenient way to pass arguments to
new, so you can say:
new( name => 'Ecoli' ) instead of new( _name => 'Ecoli' )

But you can just define a subroutine called, conveniently,
name that returns the value of the attribute
$obj->{_name}.
In my program, I have defined a method for each key in the hash. I
have method name, which accesses the value of the
key _name; I also have a similar method for each
other key. Here's how to define a method to access
the value of the key _name:
sub name        { $_[0] -> {_name}        }

This is called by the following line in the
testGene1 program:
print $obj1->name, "\n";

It calls the method name for the object, which
then accesses the value of the key _name in the
object. In this way the actual implementation of the data that is
stored in the object is kept hidden from users of the class methods.
If the data is retrieved with a method, and if the author of the
class decides at a later date to change the way the object stores its
data, the users of the class can still get at the data by making the
same method call. Only the internals of the method call will change;
the behavior of the method, namely what arguments you give it and
what return values you expect from it, stay the same. When the
interface remains the same, the code that uses the class can also
remain the same, saving everybody time and trouble, even when new
versions of the class are developed.
The method name receives the object as its first
argument because it is called by:
$obj1->name

The body of the subroutine uses the Perl built-in
@_ array to access its arguments. The first
argument to the subroutine is referred to as
$_[0]. That first argument is the object, a
reference to a hash, so I give it the key _name to
retrieve the desired value:
$_[0] -> {_name}

Finally, since by default a subroutine returns the value of the last
statement executed, this subroutine returns the gene name it has
retrieved from the object.

Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی