Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Mastering Perl for Bioinformatics [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید










5.1 Envisioning an Object


The Rebase project provides a set of files
that specify restriction enzymes, their cut sites,
and a great deal more information. Consider the problem of designing
an object-oriented version of code that uses this data. What will be
the objects and the methods?

Each restriction enzyme has a name; associated with its name are the
definition of its recognition site (which I'll
translate into a Perl regular expression), information about the
chemistry of the restriction enzyme, vendors of the enzyme, and other
annotation. This information is all part of the Rebase database.

Perhaps I should consider each restriction enzyme as a suitable
candidate for my basic object. I can then read in the Rebase
database, creating objects for each restriction enzyme that includes
such attributes as the recognition site, the translation of the
recognition site into a Perl regular expression, and whatever
additional annotation I find useful.

With such objects, I can associate methods that take as their
arguments sequence data and return the list of locations in which
that particular enzyme has a recognition site in the sequence. Sounds
good, let's start coding!

But wait. What happens if, as is often the case, you want to find
multiple restriction enzymes in a sequence and display the resulting
map. With my design, you'd have to find the object
associated with each restriction enzyme, pass it to the sequence,
collect the locations, and then combine the individual lists of
locations in order to display the map. This can be slow (finding the
right objects, one for each restriction enzyme) and inconvenient
(combining the output of the various methods from the various
objects).

You recognize this questioning as an essential step in program
designthinking about the problem and considering alternative
ways to write code that solve it. I reprise the idea here because, so
far, I've been simply seeing and discussing
solutions. Although it's neat and tidy, it
isn't really the way programming works. Programming
often involves thinking of alternative program strategies, comparing
them, coding the most promising alternatives as prototypes and
testing them (i.e., benchmarking), and finally deciding on an
approach to implement.

So, in that spirit, what alternatives come to mind to the one
enzyme/one object approach just described? The Rebase database is
essentially a key/value lookup database, in which the key is the
enzyme name. The value is the recognition site or annotation:
actually there are several datafiles provided in the database. But
I'm most interested in getting the recognition site,
translating it to a Perl regular expression, and reporting on the
locations in some sequence data. A nice interface to display some of
the annotation of the restriction enzyme would also be useful.

Any key/value type of data immediately brings the hash data structure
to the mind of the Perl programmer. As you know from my introduction
to object-oriented programming, the hash data structure is also the
most useful way to implement an object.

So, perhaps instead of many objects, one for each restriction enzyme,
you may want to consider one object that provides the fast lookup of
a value (the recognition site and regular expression) for each key
(the name of the restriction enzyme). Clearly, this can be
implemented as a hash. Other attributes can hold the sequence and the
map as an array of the positions in the sequence in which the
recognition sites exist. Methods for the
object could extract the site, the regular expression, and perhaps
some annotation, for each enzyme. A method can also locate the
recognition sites for an enzyme in the sequence.

If we go that way, how will we manage the actual restriction maps
that are made? A restriction map has as input some sequence and a
list of restriction enzymes, and has as output a list of the
locations where the enzymes have recognition sites in the sequence.
Should there be another kind of object, a
Restriction object,
that has attributes of sequence, enzyme names, and locations of
recognition sites?

Perhaps we can use the SeqFileIO class from Chapter 4 as a base class for a new derived class that
adds attributes for restriction maps on the sequence.

That might be possible, but it combines file manipulations with
restriction mapping and seems, at best, a shotgun wedding.

So, after careful reflection, consultation with colleagues, a lab
meeting, pressure from the PI, an opinion from an outside expert, and
some quick and dirty Perl scripts to see some alternatives in action,
a decision is reached. We'll make a big
Rebase object to hold the enzyme/recognition site
data, plus a new Restriction object that holds the
sequence and the locations of the recognition sites (the
"map"). The class will provide the
methods needed to calculate a restriction map.

One of the considerations that led to this decision was that, at some
point, it will be necessary to graphically display the restriction
map; an object that contains the sequence and the map (the locations
of the recognition sites in the sequence) will be well suited for
adding some graphics capabilities.

Finally, in this chapter we'll use the
Restriction class as a base class to develop a
Restrictionmap class that does have some graphics
capabilities.

For more discussion of how to design the component parts of this
software development project, see the exercises at the end of the
chapter.


/ 156