Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Mastering Perl for Bioinformatics [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید










4.5 Exercises


Exercise 4.1



Write an object-oriented module DNAsequence whose
object has one attribute, a sequence of DNA, and two methods,
get_dna and set_dna. Start with
the code for Gene.pm, but see how far you can
whittle it down to the minimum amount of code necessary to implement
this new class.


Exercise 4.2



The FileIO.pm module implements objects that read
and write file data. However, they can, depending on the program,
deviate substantially from what are actually present in files on your
computer. For instance, you can read in all the files in a folder,
and then change the filenames and data of all the objects, without
writing them out. Is this a good thing or a bad thing?


Exercise 4.3



In the text, you are asked why the new constructor
for FileIO.pm has been whittled down to the bare
bones. You can see that all it does is create an empty object. What
functionality has been moved out of the new
constructor and into the read and
write methods? Does it make more sense to do
without a new constructor entirely and instead
have the read and write methods
create objects? Try rewriting the code that way. Alternately, does it
make sense to try rewriting the code so that both reading and writing
are handled by the new constructor? Is creating an
object sometimes logically distinct from initializing it?


Exercise 4.4



Use FileIO.pm as a base class for a new class that
manages the annotation of a pipeline in your laboratory. For example,
perhaps your lab gets sequence from your ABI machine, screens it for
vectors, assesses the quality of the sequencing run, searches your
local database to determine if you've seen it or
something like it before, then searches GenBank to see what other
known sequences it matches or resembles, and finally adds it to an
assembly project. Each step has a person or persons, a timestamp for
the beginning and ending of each phase, and data. You want to be able
to track the work done on each sequence that emerges from your ABI.
(This is just an example. Pick a set of jobs that you actually do in
your lab.)


Exercise 4.5



For each sequence file format handled by the
SeqFileIO.pm module, find the documentation that
specifies the format. Compare the documentation with the
is_, parse_, and
put_ method to recognize, read, and write files in
each format. How can you improve this code? Make it more complete?
Faster?


Exercise 4.6



My parse_ methods are somewhat ad hoc. They
don't really parse the whole file according to the
definition of the format. They just extract the sequence and a small
amount of annotation. Take one of the formats and write a more
complete parser for it. What are the advantages and disadvantages of
a simple versus a more complete parser in this code? How about for
other applications you may want to develop in the future?


Exercise 4.7



Use the parser you developed in

Exercise 4.6

to do a more complete
job of identifying a file in the same format in the
module's is_ method.


Exercise 4.8



Add a new sequence file format to SeqFileIO.


Exercise 4.9



In FileIO.pm, and in many other places in this
book, the program calls croak and exits when a
problem arises (such as when unsuccessfully attempting to open a file
for reading). Such drastic measures are sometimes desirable; for
example, you may want to kill the program if a security problem is
discovered in which someone is attempting to read a forbidden file.
Or, when developing software, you may like your program to print an
informative message and die when a problem occurs, as that might help
you develop the program faster.

However, very often what you really want is for the program to notice
the error and take some appropriate steps, not simply die. If a file
cannot be opened, it may be something as simple as the user of the
program mistyping the filename, and what you'd like
is to give the user another couple of chances to type the name in
correctly. Rewrite FileIO.pm without calling
croak. This may entail checking for the success or
failure of certain operations and taking reasonable actions on
failure. Should the class module take all such actions, or should the
program that uses the class module be expected to behave
appropriately when a failure is reported?


Exercise 4.10



The AUTOLOAD method in
FileIO.pm tests for attributes that are scalars
and references to arrays. The need for this comes from the list of
attributes given in the %_my_attribute_properties
hash. Each attribute hash value is an anonymous array with two
elements: default value and properties. From the default value you
can see that a value is either a scalar (a string in this case) or an
anonymous array (a reference to an array). The code that
AUTOLOAD installs for accessor routines then
checks if the attribute is either a scalar or a reference to an
array.

This AUTOLOAD method is inherited by
SeqFileIO.pm. One of the modifications that
SeqFileIO.pm makes is defining its own
%_my_attribute_properties to handle the new
attributes that it defines, such as _sequence. In
this case, all the attributes are either scalars or references to
arrays, as before. What modifications are necessary if some other
data type is needed for a new attribute by a class that inherited
FileIO.pm? How can you rewrite
FileIO.pm to make it easier to write classes that
inherit it?


Exercise 4.11



The test program testSeqFileIO has certain
shortcomings. For one thing, it repeats blocks of code that can be
replaced with a short loop (with a little rewriting). Another problem
is that it doesn't test everything in the class.

Rewrite testSeqFileIO so that
it's clearer and more comprehensive. By default,
make it just give a short summary of the number of tests performed
and the number of tests passed, but add a verbose
flag so that it prints out all its tests in detail when desired. The
module SeqFileIO.pm is lacking POD
documentation.Add POD documentation to the module that is fairly
easily cut and pasted into a test program for the module.


Exercise 4.12



In SeqFileIO.pm, the hash
%_all_attribute_properties changed from the base
class and needed to be redefined. However, the code for the
_all_attributes,
_attribute_default, and
_permissions helper methods
didn't change. Why then did the new class
SeqFileIO redefine these methods? (Hint: are these
helper methods closures?) SeqFileIO.pm is also
lacking POD documentation. Try adding POD documentation to the module
soy that it can be easily cut and pasted into a test program for the
module.


Exercise 4.13



The h2xs program that ships with Perl simplifies
module creation, and even helps you create the
Makefile.PL that you'll need to
add your own module to CPAN or to your local installation (which
helps you bypass the somewhat awkward use
lib directive that appears in the programs in this
book). See also the perlxstut, the
ExtUtils::MakeMaker, and the
AUTOLOAD manpages. In particular, see the
-X option to h2xs. Write a
module starting from the use of h2xs.


Exercise 4.14



The open calls in the read
methods of the classes in this chapter specify a filehandle
FileIOFH. Alternatives include using lexical
scalars as filehandles or the IO::Handle package.
Rewrite the read methods so files are opened with
these alternative types of filehandles. What costs or benefits result
from these rewritings? (See the perlopentut part
of the Perl documentation.)


Exercise 4.15



In the AUTOLOAD method, a copy of the file data is
returned from the get_filedata accessor; this will
protect the actual file data in the object, but it makes a copy of a
potentially very large amount of data, which can overtax your system.
Discuss alternatives for this behavior, and implement one of them.


Exercise 4.16



Reading in a few hundred large files (as can easily happen with the
modules in this chapter) can overtax your system, causing the system,
or at least the program, to crash. Design two alternative methods
that avoid this overuse of memory. For instance, you can avoid
reading in a file until the sequence data is actually needed. You can
also reread the data into the program each time needed but not save
it in your object. Finally, you can reclaim memory from older files.
Implement one of these methods or some other. What other parts of the
code need to be altered?




/ 156