Mastering Perl for Bioinformatics [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Mastering Perl for Bioinformatics [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید










6.3 Relational Database Definitions


A relational
database is essentially a set of
tables
(relations)
that are interrelated in certain ways. A table is a two-dimensional
matrix with a name. Each table is composed of
rows
(tuples),
and no two rows can have exactly the same values. Each row is
composed of named fields
(attributes); each field has a certain
data type and a
name, and a field can only contain one value.[3]

[3] As the
name "relation" suggests to the
mathematically inclined, each table represents a subset of the
Cartesian product of the domains of the fields, in which each row is
an element of the relation. However, the order of the fields in a
table is not significant (because each field has a unique name), and
that's an important departure from the standard
set-theoretic definition of a relation.


For instance, a table may be defined with the fields Name as a
character string of at most 50 characters, an ID as an integer, and a
Date as a special date datatype. Each row then has a Name, ID, and
Date value. Table 6-1 and Table 6-2 show a table called genename with three fields
and three rows, and one called organism with two fields and five
rows.

Table 6-1. genename


Name


ID


Date


aging


118


1984-07-13


wrinkle


9223


1987-08-15


hairy


273


1990-09-30

Table 6-2. organism


Organism


Gene


human


118


human


9223


mouse


9223


mouse


273


worm


118

You'll notice that the tables each have a field with
the same set of values as the other; ID in Table 6-1 and Gene in Table 6-2 use
values from the same domain. (A typical domain is the set of positive
integers, for example.) Such shared fields can be used to join
information from two or more tables. For instance, given a gene Name
from the Table 6-1, I can find its associated ID,
and look for that value in the Gene field of the Table 6-2 to find what organism or organisms have a
version of that gene.

In Table 6-1, the wrinkle gene has ID value 9223,
and in Table 6-2, there are two entries in field
Gene with value 9223, namely human and mouse.

Each
row in a relation is unique and can therefore serve as its own unique
identifier for the row. There may also be some smaller group of
fields that, together, are unique for each row and for which removing
any field destroys the uniqueness; such a group is a
candidate key. Very often tables are defined so that
each row has its own unique ID field (it may be called something
besides ID) that alone may serve as a key; this usually is
recommended.

In any event, some candidate key is designated as the
primary key for the table.

If in another table there is a field with the same domain as the
primary key, it can relate the information in the two tables. That
field, in the other table, is called a foreign
key,
and the primary key's table is called the foreign
key's home
relation. In MySQL, foreign keys
aren't specified as such; they are used only in
joins, as you'll see.

In defining fields, you may also specify that no row has a null
(undefined) entry in a field; this is required for primary key
fields. Similarly, a foreign key is required to refer to an actually
existing value in some field in its home relation. These constraints
on the data are known as
entity integrity and
referential
integrity.

Using some SQL statements, you can write a (very short) program that,
given a gene name, returns the list of organisms in which a version
of that gene is found (in this database). In fact,
I'll do just that in the next section.

That's simple enough. However, there is a bit more
to learn about designing tables that work well, implementing the
tables in SQL, and writing the (Perl) programs that send SQL
statements to the DBMS and that compute and display the results.
These areas of database design, implementation, and application
development, are often staffed by specialists in large projects. Each
specialization has its own techniques and lore; each can be a
full-time job, and there are even more areas of specialization than
these. Most bioinformatics programmers know enough about each area to
design and implement a web-based, database-driven interface to their
laboratory, which is the basic skill set that I'm
attempting to impart. Assuming you are a beginner interested in
learning the ropes, the following sections and chapters will
hopefully give you enough of a jump start to be able to tackle all
these tasks for a small project and prepare you to attempt bigger
jobs.


/ 156