Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

5.6 Exercises

Exercise 5.1

Why use the object-oriented approach for the interface to the Rebase
database at all? What are the benefits and detriments of going to the
object-oriented style?

Exercise 5.2

The Restriction.pm module uses another module in a
new way. Instead of inheriting the Rebase.pm
class, it requires that a Rebase object be passed
to the constructor Restriction->new to become
one of the attributes of the Restriction object.

Consider alternative ways to write this code. Can
Restriction inherit Rebase and
achieve the same functionality? If so, write the code. Or, can the
same functionality be achieved by some method that avoids having a
Rebase object passed as an argument to a
Restriction object? If so, write the code.

Exercise 5.3

Go to CPAN and read the documentation about the MLDBM module. It
allows you to use a DBM file to store and retrieve complex data.
Rewrite the Rebase.pm module to use
MLDBM and replace my use of space-separated
strings of recognition sites and regular expressions.

Exercise 5.4

As discussed in the text, there are some interesting considerations
involved in parsing the data that relates to how the restriction
enzymes actually work, such as handling reverse complements of
recognition sites and cut sites. The logic used here to handle
reverse complements might not be ideal for all situations. Review
carefully the logic of the parse_rebase
subroutine. Can you find any problems its logic might cause when you
try to use the software to support a particular experiment?

Exercise 5.5

It would be nice to be able to ask some method in
Restriction.pm if a particular restriction enzyme
produces sticky ends at its cut site. It would also be useful to know
what other enzymes create sticky ends that will anneal with the
sticky ends of this enzyme. Check to see if this information appears
in any of the datafiles of the Rebase database. Can you design a
method that returns this information, given the name of a restriction
enzyme? What changes do you have to make to your database; do you
need any more datafiles from the Rebase distribution?

Exercise 5.6

Describe in detail how the logic for map_enzyme
works. Can you devise a different way to accomplish the same thing?

Exercise 5.7

The code in this chapter uses the class
Restriction as a base class for the class
Restrictionmap which lets you make a graphic
display of the restriction map. Would it be a better idea just to add
the graphics capabilities to the Restriction class
instead of inheriting it into a new class? Rewrite
Restriction to add the graphics capability to it.
What are the pros and cons of these two different ways of writing and
organizing the code?

Exercise 5.8

In the method _formatrestrictionmap, some lines of
code are commented out that shorten the output by not printing extra
blank lines. Try it out both ways. (And may God have mercy on your
souls.) Do you think it makes the output less lengthy at the expense
of making it more difficult to read? What is the tradeoff here? Do
you prefer the longer or shorter version? Defend your preference.

Exercise 5.9

Add position numbers to the output of
Restrictionmap. Add the position of the first base
in each line or the position of each restriction enzyme.

Exercise 5.10

The _drawmap_text method of the
Restrictionmap class is a bit lengthy and
involved. See if you can improve the method. Either alter the code in
the book or start from scratch. Improve it by making it faster,
simpler, or easier to read. Try making its output better or add
options to make the output more flexible. Try any combination of the
above.

Exercise 5.11

String copying is a great way to slow down a program. Consider the
code I gave for the following subroutine:

sub complementIUB {
my($seq) = @_;
(my $com = $seq) =~ tr [ACGTRYMKSWBDHVNacgtrymkswbdhvn]
[TGCAYRKMWSVHDBNtgcayrkmwsvhdbn];
return $com;
}

Explain why the subroutine is written in this somewhat slow way. Now,
rewrite this subroutine to eliminate a string copy. (Extra challenge:
there are actually two string copies here. Rewrite the subroutine
another way to eliminate a string copy. Can you eliminate both string
copies? Why or why not?) Also, what's with those
square brackets around the arguments to the tr
function?

Exercise 5.12

Consider the following lines from the subroutine
IUB_to_regexp:

# Remove the ^ signs from the recognition sites
$iub =~ s/\^//g;

This operation is redundant because the caret ^ was removed from the
recognition site in the subroutine parse_rebase.
Why is it included here?

Exercise 5.13

Consider the following last two lines from the subroutine
map_enzyme in Restriction.pm:

@{$self->{_map}{$enzyme}} = @positions;
return @positions;

How does the subroutine behave differently if the first line is
changed to:

$self->{_map}{$enzyme} = \@positions;

Why does the subroutine return the array
@positions since the return value
isn't used in any of the code and the positions are
saved in the object anyway?

Exercise 5.14

There is a difference in behavior and readability between the looping
constructs for(;;) and for( )
or its synonym foreach( ). Try writing some small
test programs that use these different loops and time them using the
Perl modules Benchmark or
Devel::DProf. Clearly, for and
foreach are most useful when iterating through
arrays, and for(;;) is most useful when iterating
through numbers. However, there are places in the code presented in
this chapter in which for(;;) iterates through an
array using a scalar variable as a subscript counter (as
$i in $array[$i].) Try finding
and rewriting such loops using foreach; benchmark
the two versions.