l xmlns="http://www.w3.org/1999/l">
NormalizationIt's possible to consolidate all information about books (or any entity type) into a single monolithic table, but that table would be loaded with duplicate data; each title (row) would contain redundant author, publisher, and royalty details. Redundancy is the enemy of database user and administrators: It causes databases to grow wildly large, it slows queries, and it's a maintenance niare. (When someone moves, you want to change her address in one place, not thousands of places.) Redundancies lead to a variety of update anomalies that is, difficulties with operations that insert, update, and delete rows. Normalization is the processa series of stepsof modifying tables to reduce redundancy and inconsistency. After each step, the database is in a particular normal form. The relational model defines three normal forms, named after famous ordinal numbers:
Each normal form is stronger than its predecessors; a database in 3NF also is in 2NF and 1NF. Higher normalization levels tend to increase the number of tables relative to lower levels. Lossless decomposition ensures that table splitting doesn't cause information loss, and dependency-preserving decomposition ensures that relationships aren't lost. The matching primary-and foreign-key columns that appear when tables are split are not considered to be redundant data. Normalization is not systematic; it's an iterative process that involves repeated table splitting and rejoining and refining until the database designer is (temporarily) happy with the result. First normal formA table in first normal form:
An atomic value, also called a scalar value, is a single value that can't be subdivided ( Figure 2.16 ). A repeating group is a set of two or more logically related columns ( Figure 2.17 ). To fix these problems, store the data in two related tables ( Figure 2.18 ). Figure 2.16. In first normal form, each table's rowcolumn intersection must contain a single value that can't be subdivided meaningfully. The column authors in this table lists multiple authors and so violates 1NF.title_id title_name authors -------- -------------------------------- ------------- T01 1977! A01 T04 But I Did It Unconsciously Figure 2.17. Redistributing the column authors into a repeating group also violates 1NF. Don't represent multiple instances of an entity as multiple columns.title_id title_name author1 author2 author3 -------- -------------------------------- ------- ------- ------- T01 1977! A01 T04 But I Did It Unconsciously Figure 2.18. The correct design solution is to move the author information to a new child table that contains one row for each author of a title. The primary key in the parent table is title_id, and the composite key in the child table is title_id and au_id.A database that violates 1NF causes problems:
Second normal formBefore I give the constraints for second normal form, I'll mention that a 1NF table automatically is 2NF if:
A table in second normal form:
A table contains a partial functional dependency if some (but not all) of a composite key's values determine a nonkey column's value. A 2NF table is fully functionally dependent, meaning that a nonkey column's value might need to be updated if any column values in the composite key change. The composite key in the table in Figure 2.19 is title_id and au_id. The nonkey columns are au_order (the order in which authors are listed on the cover of a book with multiple authors) and au_phone (the author's phone number). Figure 2.19. au_phone depends on au_id but not title_id, so this table contains a partial functional dependency and isn't 2NF.
For each nonkey column, ask, "Can I determine a nonkey column value if I know only part of the primary-key value?" A no answer means the nonkey column is fully functionally dependent (good); a yes answer means that it's partially functionally dependent (bad). For au_order, the questions are:
Goodau_order is fully functionally dependent and can remain in the table. This dependency is written
{title_id, au_id} and is read "title_id and au_id determine au_order" or "au_order depends on title_id and au_id". The determinant is the expression to the left of the arrow. For au_phone, the questions are:
Badau_phone is partially functionally dependent and must be moved elsewhere (probably to an authors or phone_numbers table) to satisfy 2NF rules. Third normal formA table in third normal form:
A table contains a transitive dependency if a nonkey column's value determines another nonkey column's value. In 3NF tables, nonkey columns are mutually independent and dependent on only primary-key column(s). 3NF is the next logical step after 2NF. The primary key in the table in Figure 2.20 is title_id. The nonkey columns are price (the book's price), pub_city (the city where the book is published), and pub_id (the book's publisher). Figure 2.20. pub_city depends on pub_id, so this table contains a transitive dependency and isn't 3NF.For each nonkey column, ask, "Can I determine a nonkey column value if I know any other nonkey column value?" A no answer means that the column is not transitively dependent (good); a yes answer means that the column whose value you can determine is transitively dependent on the other column (bad). For price, the questions are:
For pub_city, the questions are:
For pub_id, the questions are:
Badpub_city is transitively dependent on pub_id and must be moved elsewhere (probably to a publishers table) to satisfy 3NF rules. As you can see, it's not enough to ask, "Can I determine A if I know B?" to discover a transitive dependency; you also must ask, "Can I determine B if I know A?" Other normal formsHigher levels of normalization exist, but the relational model doesn't require (or even mention) them. They're simply useful to avoid redundancy. Briefly, they are:
|