Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Mastering Perl for Bioinformatics [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید










2.6 Data Structures in Action


The previous sections introduced a fair amount of new Perl syntax and
capabilities. Now, let's see some of these new
capabilities in action.


2.6.1 The Problem of String Matching


It
is frequently important in biology to find the best possible match
for a short sequence in a longer sequence; for example, between an
oligonucleotide and the sequence of DNA that has been cloned in a YAC
or BAC. This match need not always be perfect; frequently, what is
important is to find the closest match available. This problem is
known in computer science as approximate string matching, and dynamic
programming is a popular technique used to compute the solution.

The problem of string matching is to find a pattern, such as a
nucleotide or peptide fragment, in a longer text such as a chromosome
or protein.

The problem of approximate string matching is to find a pattern in a
text in which the match might not be perfect. Perhaps a few of the
characters are different or missing; the problem is to find the best
match possible.


2.6.2 Genetic Variability and String Matching


Biologically,
approximate matches are of commanding importance. Evolutionary
changes between species can make genes with essentially the same
function collect a fair number of individual base changes; they may
even have acquired differences in exon structure. Even within a
species, individual base changes among groups in the population
(single nucleotide polymorphisms) are important causes of disease and
important clues in the discovery of disease-causing genes.

Mutations tend to accumulate over time in noncoding regions of DNA;
mutations in coding regions tend to avoid altering critical regions
essential for the functioning of the gene (where mutations may be
fatal to the organism). Even a noncoding region may be critical to
the regulation of a gene and thus tend to resist mutations. As a
result, studying where mutations are not accumulating is often an
important clue to discerning the function and control of a gene and
its associated protein.

Due to the redundancy of the genetic code, mutations in DNA may not
affect the coding of the associated protein. Other mutations may make
a change in a coded amino acid, but it may be an amino acid with
similar properties to the original amino acid; for instance, both
amino acids may be hydrophilic. Tracing these kinds of mutations is
another source of important information about the process of
mutation. It can give vital clues to the conservation of critical
coding regions and to the folding of proteins.

Biologists will have no difficulty expanding upon the preceding brief
motivation for studying approximate string matching and other
techniques that find similarities between biological molecules. Many
standard laboratory techniques rely on the annealing of a molecule to
another molecule with similar, but not necessarily exact, structure.

In the discussion that follows, you'll use your new
knowledge of complex data structures in Perl to implement an
algorithm that finds approximate matches of patterns in text, such as
DNA fragments in chromosomes or peptide fragments in proteins.

The algorithm I present is an example of dynamic programming; it
relies on computing the entries to a matrix.


/ 156