VISUAL QUICKSTART GUIDE SQL Second Edition [Electronic resources] نسخه متنی

"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

Normalization

It's possible to consolidate all information about books (or any entity type) into a single monolithic table, but that table would be loaded with duplicate data; each title (row) would contain redundant author, publisher, and royalty details. Redundancy is the enemy of database user and administrators: It causes databases to grow wildly large, it slows queries, and it's a maintenance nightmare. (When someone moves, you want to change her address in one place, not thousands of places.)

Redundancies lead to a variety of

update anomalies that is, difficulties with operations that insert, update, and delete rows.

Normalization is the processa series of stepsof modifying tables to reduce redundancy and inconsistency. After each step, the database is in a particular

normal form. The relational model defines three normal forms, named after famous ordinal numbers:

First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)

Each normal form is stronger than its predecessors; a database in 3NF also is in 2NF and 1NF. Higher normalization levels tend to increase the number of tables relative to lower levels.

Lossless decomposition ensures that table splitting doesn't cause information loss, and

dependency-preserving decomposition ensures that relationships aren't lost. The matching primary-and foreign-key columns that appear when tables are split are not considered to be redundant data.

Normalization is not systematic; it's an iterative process that involves repeated table splitting and rejoining and refining until the database designer is (temporarily) happy with the result.

First normal form

A table in

first normal form:

Has columns that contain only atomic values
and
Has no repeating groups

atomic value, also called a

scalar value, is a single value that can't be subdivided (

Figure 2.16 ). A

repeating group is a set of two or more logically related columns (

Figure 2.17 ). To fix these problems, store the data in two related tables (

Figure 2.18 ).

Figure 2.16. In first normal form, each table's rowcolumn intersection must contain a single value that can't be subdivided meaningfully. The column `authors` in this table lists multiple authors and so violates 1NF.

title_id title_name authors
-------- -------------------------------- -------------
T01 1977! A01
T04 But I Did It Unconsciously

A03, A04
T11 Perhaps It's a Glandular Problem

A03, A04, A06

Figure 2.17. Redistributing the column `authors` into a repeating group also violates 1NF. Don't represent multiple instances of an entity as multiple columns.

title_id title_name author1 author2 author3
-------- -------------------------------- ------- ------- -------
T01 1977! A01
T04 But I Did It Unconsciously

A03 A04
T11 Perhaps It's a Glandular Problem

A03 A04 A06

Figure 2.18. The correct design solution is to move the author information to a new child table that contains one row for each author of a title. The primary key in the parent table is `title_id`, and the composite key in the child table is `title_id` and `au_id`.

A database that violates 1NF causes problems:

Multiple values in a rowcolumn intersection mean that the combination of table name, column name, and key value is insufficient to address every value in the database.
It's difficult to retrieve, insert, update, or delete a single value (among many) because you must rely on the order of the values.
Queries are complex (a performance killer).
The problems that further normalization solves become unsolvable.

Second normal form

Before I give the constraints for second normal form, I'll mention that a 1NF table automatically is 2NF if:

Its primary key is a single column (that is, the key isn't composite)
or
All the columns in the table are part of the primary key (simple or composite)

A table in

second normal form:

Is in first normal form
and
Has no partial functional dependencies

A table contains a

partial functional dependency if

some (but not all) of a composite key's values determine a nonkey column's value. A 2NF table is

fully functionally dependent, meaning that a nonkey column's value might need to be updated if

any column values in the composite key change.

The composite key in the table in

Figure 2.19 is title_id and au_id. The nonkey columns are au_order (the order in which authors are listed on the cover of a book with multiple authors) and au_phone (the author's phone number).

Figure 2.19. `au_phone` depends on `au_id` but not `title_id`, so this table contains a partial functional dependency and isn't 2NF.

Atomicity

Atomic values are

perceived to be indivisible from the point of view of database users. A date, telephone number, and a character string, for example, aren't really intrinsically indivisible because you can decompose the date into a year, month, and day; the phone number into a country code, area code, and subscriber number; and the string into its individual characters. What's important as far as you're concerned is that the DBMS provide operators and functions that let you extract and manipulate the components of "atomic" values if necessary, such as a substring() function to extract a telephone number's area code or a year() function to extract a date's year.

For each nonkey column, ask, "Can I determine a nonkey column value if I know only

part of the primary-key value?" A

no answer means the nonkey column is fully functionally dependent (good); a

yes answer means that it's partially functionally dependent (bad).

For au_order, the questions are:

Can I determine au_order if I know only title_id? No, because there might be more than one author for the same title.
Can I determine au_order if I know only au_id? No, because I need to know the particular title too.

Goodau_order is fully functionally dependent and can remain in the table. This dependency is written


{title_id, au_id}  {au_order}

and is read "title_id and au_id determine au_order" or "au_order depends on title_id and au_id". The

determinant is the expression to the left of the arrow.

For au_phone, the questions are:

Can I determine au_phone if I know only title_id? No, because there might be more than one author for the same title.
Can I determine au_phone if I know only au_id? Yes! The author's phone number doesn't depend upon the title.

Badau_phone is partially functionally dependent and must be moved elsewhere (probably to an authors or phone_numbers table) to satisfy 2NF rules.

Third normal form

A table in

third normal form:

Is in second normal form
and
Has no transitive dependencies

A table contains a

transitive dependency if a nonkey column's value determines another nonkey column's value. In 3NF tables, nonkey columns are mutually independent and dependent on only primary-key column(s). 3NF is the next logical step after 2NF.

The primary key in the table in

Figure 2.20 is title_id. The nonkey columns are price (the book's price), pub_city (the city where the book is published), and pub_id (the book's publisher).

Figure 2.20. `pub_city` depends on `pub_id`, so this table contains a transitive dependency and isn't 3NF.

For each nonkey column, ask, "Can I determine a nonkey column value if I know

any other nonkey column value?" A

no answer means that the column is not transitively dependent (good); a

yes answer means that the column whose value you can determine is transitively dependent on the other column (bad).

For price, the questions are:

Can I determine pub_id if I know price? No.
Can I determine pub_city if I know price? No.

For pub_city, the questions are:

Can I determine price if I know pub_city? No.
Can I determine pub_id if I know pub_city? No, because a city might have many publishers.

For pub_id, the questions are:

Can I determine price if I know pub_id? No.
Can I determine pub_city if I know pub_id? Yes! The city where the book is published depends on the publisher.

Badpub_city is transitively dependent on pub_id and must be moved elsewhere (probably to a publishers table) to satisfy 3NF rules.

As you can see, it's not enough to ask, "Can I determine A if I know B?" to discover a transitive dependency; you also must ask, "Can I determine B if I know A?"

Other normal forms

Higher levels of normalization exist, but the relational model doesn't require (or even mention) them. They're simply useful to avoid redundancy. Briefly, they are:

Boyce-Codd normal form is a more rigorous version of 3NF. BCNF deals with tables that have multiple candidate keys, composite candidate keys, or candidate keys that overlap. A table is in BCNF if every determinant is a candidate key. (A determinant column is one on which some of the columns are fully functionally dependent.)
A table in
fourth normal form is in BCNF and has no multivalued dependencies (MVDs). An MVD occurs when in a table containing at least three columns, one column has multiple rows whose values match a value of a single row of one of the other columns.
Suppose that employees can be assigned to multiple projects and each employee can have multiple skills. If you stuff all this information into a single table, you must use all three attributes as the key because nothing less can identify a row uniquely. The relationship between emp_id and proj_id is an MVD because for each pair of emp_id/skill_id values in the table only emp_id (independent of skill_id) determines the associated set of proj_id values. The relationship between emp_id and skill_id also is an MVD because the set of skill values for an emp_id/proj_id pair always depends on only emp_id. To transform a table with MVDs to 4NF, move each MVD pair to a new table.
A table in
fifth normal form is in 4NF and has no join dependencies, which are generalizations of MVDs. The aim of 5NF is to have tables that can't be decomposed further into any number of smaller tables. The redundancies and anomalies that 5NF cures are rare and unintuitive. In real databases, you'll see 1NF, 2NF, 3NF, and occasionally 4NF. 4NF and even 3NF tables almost always are 5NF too.

Denormalization

The increased number of tables that normalization generates might sway you to

denormalize your database to speed queries (because fewer tables reduces computationally expensive joins and disk I/O). This common technique trades off data integrity for performance and presents a few other problems. A denormalized database:

Usually is harder to understand than a normalized one
Usually makes retrievals faster but updates slower
Increases the risk of inserting inconsistent data
Might improve the performance of some database applications but hurt that of others (because users' table-access patterns change over time)

The need for denormalization isn't a weakness in the relational model but reveals a flawed implementation of the model in DBMSes.

VISUAL QUICKSTART GUIDE SQL Second Edition [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی