Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Mastering Perl for Bioinformatics [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید










6.1 One Perl, Many Databases


There comes a time when disk files or the simple DBM hash database
(that you've seen in previous chapters) just
won't manage the data of a medium- or large-size
project, and you must turn to relational databases. Although they
take quite a bit more effort to set up and to program, they offer a
standard and reliable way to store data and to ask questions about
it.

There are two things that make relational databases standard. For one
thing, they all follow a certain model of data structures, the
relational model. These data structures have become a
fixture in the computing world; they combine a level of constraint
and flexibility that has proved its usefulness in many areas,
including bioinformatics.

Almost all relational databases are programmed with a programming
language called the Structured Query Language, or SQL.
This is a fairly simple language that creates, populates, queries,
and manages the kind of data structures relational databases provide.
The combination of a standard data structure with a standard
programming language is another reason relational databases have
become so successful.

One thing that's not standard is the proliferation
of relational database companies and their penchant for doing things
their own way. This may sometimes be a marketing decision, but
it's more often the natural process of
evolutionof different sets of programmers having different
ideas and making different implementations.[1]

[1] When I
first released software that used Perl for bioinformatics, I received
a letter arguing that because C was available everywhere and Perl
wasn't, Perl for bioinformatics was therefore a Bad
Idea and I should use C instead. Of course, Perl is available
everywhere now, including on the VMS systems that my correspondent
was complaining about, and bioinformatics software is written in a
variety of languages. He made the classic mistake of wanting to
standardize a field long before it had settled down.


This is important when you have some working database application
that uses a particular DBMS such as Oracle, and you find that you
have to port the application to work on another DBMS such as MySQL.
Perhaps another database system has become significantly faster or
cheaper, or your computer is replaced with a new one that supports a
different database, or your computer center or CIO decrees that some
new DBMS is now the mandatory standard. If your database application
makes extensive use of a feature that is available only on your old
DBMS, you'll have a lot of work ahead of you
rewriting your software to make it work on the new DBMS.

Luckily, thanks to some expert Perl programming, there is a way to
get around this proliferation of different DBMS with their special
ways of doing things and their special extensions of SQL. In this
chapter, I'll use the Perl DBI (DataBase
Independent) module that provides a common interface to different
relational database systems; it makes it possible to write SQL that
will run on many different relational database systems with little or
no change.

Still, unless you are subject to a decree, the problem that the Perl
bioinformatics programmer faces at the beginning of a project is,
"Which relational database system should I
use?" It depends on the computer
you're on and what DBMS is already in use,
available, paid for, or known locally. There are very expensive
systems, and there are free ones: we'll take a quick
look at some of the alternatives and use one of the most popular free
ones for the following examples.

The beginning programmer should be aware that relational databases
are a large field of endeavor. Stop at any local bookstore with a
good computer book section and you'll see an
impressive number of books dedicated to relational databases and SQL
in general, and especially dedicated to working with specific
relational database management systems such as Oracle, SQL Server
(the database, not the language), Sybase, MySQL, etc. There are books
devoted to specific tools for designing a database, managing a
database system, and programming a database system. In the workplace,
there are job titles and positions for people who specialize in these
three areas, and more.

So, don't expect this one chapter to reveal all. Do
expect it to explain the basic concepts, give you the lay of the
land, and demonstrate a practical example you can use as a template
as you begin to develop your own code. At the very least, you will
want access to the documentation for the particular database system
you will use in your own work.

Having given the obligatory warning, I'll also add
that, unless you are tackling a fairly large project, relational
databases aren't all that difficult to use. If they
were, they wouldn't be so popular. You may well
spend a lot of your programming time in the future dealing with
databases, or you may just spend a little. If you expect it will be a
lot, I recommend you do some further reading.

Relational databases are an important topic. Because they have their
own software systems, language, and concepts, they are a bit of an
additional challenge. Most bioinformaticians need to know the basics
of how to use them; some specialize in them. This chapter will get
you started.


/ 156