Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

2.4 Complex Data Structures

Different algorithms require
different data structures. Using references in Perl, it is possible
to build very complex data structures.

This section gives a short introduction to some of the possibilities,
such as a hash with array values and a two-dimensional array of
hashes. See the recommended reading in Section 2.9 of this chapter for
books and sections of the Perl manual that are very helpful.

Perl uses the basic data types of scalar, array, and hash, plus the
ability to declare scalar references to those basic data types, to
build more complex structures. For instance, an array must have
scalar elements, but those scalar elements can be references to
hashes, in which case you have effectively created an array of
hashes.

2.4.1 Hash with Array Values

A common example of a complex
data structure is a hash with array values. Using such a data
structure, you can associate a list of items with each keyword. The
following code shows an example of how to build and manage such a
data structure. Assume you have a set of human genes, and for each
human gene, you want to manage an array of organisms that are known
to have closely related genes. Of course, each such array of related
organisms can be a different length:

use Data::Dumper;
%relatedgenes = (  );
$relatedgenes{'stromelysin'} = [
'C.elegans',
'Arabidopsis thaliana'
];
$relatedgenes{'obesity'} = [
'Drosophila',
'Mus musculus'
];
# Now add a new related organism to the entry for 'stromelysin'
push( @{$relatedgenes{'stromelysin'}}, 'Canis' );
print Dumper(\%relatedgenes);

This program prints out the following (the very useful
Data::Dumper module is described in more detail
later; try typing perldoc
Data::Dumper for the details of this useful way to
print out complex data structures):

$VAR1 = {
'stromelysin' => [
'C.elegans',
'Arabidopsis thaliana',
'Canis'
],
'obesity' => [
'Drosophila',
'Mus musculus'
]
};

The tricky part of this short program is the push.
The first argument to push must be an array. In
the program, this array is
@{$relatedgenes{'stromelysin'}}. Examining this
array from the inside out, you can see that it refers to the value of
the hash with key stromelysin:
$relatedgenes{'stromelysin'}. You know that the
values of this %relatedgenes hash are references
to anonymous arrays. This hash value is contained within a block of
curly braces, which returns the reference to the anonymous array:
{$relatedgenes{'stromelysin'}}, and the block is
preceded by an @ sign that dereferences the
anonymous array: @{$relatedgenes{'stromelysin'}}.

2.4.2 Two-Dimensional Array of Hashes

As another example, say you have data
from a microarray experiment in which each location on a plate can be
identified by an x and y location; each location is also associated
with a particular gene and has a set of reported measurements. You
can implement this particular data as a two-dimensional array, each
entry of which is a (reference to a) hash whose keys are gene names
and whose values are (references to) arrays of the measurements.
Here's how you can initialize one of the entries of
that two-dimensional array:

$array[3][4]{'stromelysin'} = [3, 4, 5];

The position on the plate is represented by an entry in the
two-dimensional array such as $array[3][4]. The
fact that the entry is a hash is shown by the reference to a
particular key with {'stromelysin'}. That the
value for that key is an array is shown by the assignment to that key
$array[3][4]{'stromelysin'} of the anonymous array
[3, 4, 5]. To print out the array associated with
the key stromelysin, you have to remember to tell
Perl that the value for that key is an array by surrounding the
expression with curly braces preceded by an @ sign
@{$array[3][4]{'stromelysin'}}:

$array[3][4]{'stromelysin'} = [3, 4, 5];
print "The scores for plate position 3, 4 were @{$array[3][4]{'stromelysin'}}
\n";

This prints:

The scores for plate position 3, 4 were 3 4 5

A common Perl trick is to dereference a
complex
data structure by enclosing the whole thing in curly braces and
preceding it with the correct symbol: $,
@, or %. So, take a moment and
reread the last example. Do you see how the following:

 $array[3][4]{'stromelysin'}

is the key for a hash? Do you see how the phrase:

@{$array[3][4]{'stromelysin'}}

makes it clear that the value for that hash key is an array?
Similarly, if the value for that hash key was a scalar, you could
say:

 ${$array[3][4]{'stromelysin'}}

and if the value for that hash key was a hash, you could say:

 %{$array[3][4]{'stromelysin'}}

2.4.3 Complex Data Structures

References give you a fair amount of flexibility. For example, your
data structures can combine references to different types of data.
You can have an anonymous array such as in the following short
program:

$gene = [
# hash of basic information about the gene name, discoverer,
#  discovery date and laboratory.
{ 
name       => 'antiaging',
reference  => [ 'G. Mendel', '1865'],
laboratory => [ 'Dept. of Genetics', 'Cornell University', 'USA']
},
# scalar giving priority
'high',
# array of local work history
['Jim', 'Rose', 'Eamon', 'Joe']
];
print "Name is ", ${$gene->[0]}{'name'}, "\n";
print "Priority is ", $gene->[1], "\n";
print "Research center is ", ${${$gene->[0]}{'laboratory'}}[1], "\n";
print "These individuals worked on the gene: ", "@{$gene->[2]}", "\n";

This program produces the output:

Name is antiaging
Priority is high
Research center is Cornell University
These individuals worked on the gene: Jim Rose Eamon Joe

Let's examine this code to understand how it works;
it contains most of the points made in this chapter.

$gene is a pointer to an anonymous array of three
elements. Therefore each element of $gene is
referred to by either:

$$gene[0]
$$gene[1]
$$gene[2]

or equivalently (and our choice in this code) by:

$gene->[0]
$gene->[1]
$gene->[2]

To be specific, the first element is a reference to an anonymous
hash, the second element is a scalar string high,
and the third element is a reference to an anonymous
workgroup array.

The plot thickens when you examine the anonymous hash that is
referenced by the first array element. It has three keys, one of
which, name, has a simple scalar value. The other
two keys have values that are references to anonymous arrays of
scalar strings.

So, this certainly qualifies as a complex data structure!

When you place any of the elements of
the $gene anonymous array within a block of curly
braces, you have a reference that must be dereferenced appropriately.
To refer to the entire hash at the beginning of the array, say:

%{$gene->[0]}

As done with the program code, the scalar value that is the second
element of the array is accessed simply as:

$gene->[1]

The third part of this data structure is an anonymous array, which we
can refer to in total as:

@{$gene->[2]}

This is also done in the program code.

Now, let's finish by looking into the first element
of the $gene anonymous array. This is a reference
to an anonymous hash. One of the keys of that hash has a simple
scalar string value, which is referenced with:

${$gene->[0]}{name}

as was done in the program code. To make sure we understand this,
let's write it out:

${$gene->[0]}{name}
is
$ hashref    {name}
is
'antiaging'

{$gene->[0]} is a block containing a reference
to an anonymous hash. It is then used as is typical for a hash
reference: it's preceded by a $
and followed by the key name in curly braces and
so resolves to a lookup of the key name in the
anonymous hash.

The most intricate dereference in this program is that which digs out
the name of the research center:

${${$gene->[0]}{laboratory} }[1]
is
${$ hashref    {laboratory} }[1]
is
$  arrayref                  [1]
is
'Cornell University'

Here, the {$gene->[0]} is a reference to an
anonymous hash. The value for the key laboratory
is retrieved from that anonymous hash; the value is an anonymous
array. Finally, that anonymous array
${$gene->[0]}{laboratory} is enclosed in a
block of curly braces, preceded by a $, and
followed by an array index 1 in square brackets,
which dereferences the anonymous array and returns the second element
Cornell University.

Note that the last expression can also be written as:

$gene->[0]->{laboratory}->[1]

You see how the use of references within blocks enables you to
dereference some rather deep-nested data structures. I urge you to
take the time to understand this example and to use the resources
listed in Section 2.9.

Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

This is a Digital Library

With over 100,000 free electronic resource in Persian, Arabic and English