Web Database Applications With Php And Mysql (2nd Edition) [Electronic resources] نسخه متنی

5.6 Join Queries

You'll often want to
output data that's based on relationships between
two or more tables. For example, in the
winestore database, you might want to know which
customers have placed orders, which customers live in Australia, or
how many bottles of wine Lucy Williams has bought. These are examples
of join queries, queries that match rows between
tables based (usually) on primary key values. In SQL, a
join query matches rows from two or more tables
based on a condition in a WHERE clause and outputs
only those rows that meet the condition.

As part of the process of converting the winestore
entity-relationship model to SQL statements, we've
included the attributes required in any practical join condition. To
understand which tables can be joined in the
winestore database, and how the joins are
processed, it's helpful to have a copy of the ER
model at hand as you work your way through this section.

5.6.1 Beware of the Cartesian Product

Suppose you want to find
out the names of the wineries in the winestore
database and, for each winery, the name of the region that
it's located in. To do this, you examine the ER
model and discover that the region and
winery tables are related, and that they both
contain attributes that you need in the answer to your query.
Specifically, you need to retrieve the winery_name
attribute from the winery table and the
region_name attribute from the
region table, and you need to join the two
tables together to find the result.

Consider this query, which we might intuitively, but wrongly, use to
find all the wineries in a region:

SELECT winery_name, region_name FROM winery, region;

This query produces (in part) the following results:

+-------------------------------+-------------+
| winery_name                   | region_name |
+-------------------------------+-------------+
| Durham and Sons Premium Wines | Coonawarra  |
| Durham Brook Group            | Coonawarra  |
| Durham Creek                  | Coonawarra  |
| Durham Estates                | Coonawarra  |
| Durham Hill Vineyard          | Coonawarra  |

The impression here is that, for example, Durham Creek winery is
located in the Coonawarra region. This might not be the case. Why?
First, you can use the techniques covered so far in this chapter to
check which region the Durham Creek winery is located in:

SELECT region_id FROM winery WHERE winery_name='Durham Creek';

The result is:

+-----------+
| region_id |
+-----------+
|         9 |
+-----------+
1 row in set (0.01 sec)

Now, you can query the region table to find the
name of the region using:

mysql> SELECT region_name FROM region WHERE region_id=9;
+----------------+
| region_name    |
+----------------+
| Margaret River |
+----------------+
1 row in set (0.00 sec)

So, Durham Creek winery isn't in Coonawarra at all!

What happened in the first attempt at a join query? The technical
answer is that you just evaluated a
Cartesian product:
you produced as output all the possible combinations of
wineries and regions, most of which don't make any
sense. These odd results can be seen if you add an ORDER BY clause to the original query:

SELECT winery_name, region_name FROM winery, region 
ORDER BY winery_name, region_name;

Recall that the ORDER BY clause sorts the results
after the query has been evaluated and that it has no effect on which
rows are returned from the query. Here is the first part of the
output:

+---------------------------------+---------------------+
| winery_name                     | region_name         |
+---------------------------------+---------------------+
| Anderson and Sons Premium Wines | All                 |
| Anderson and Sons Premium Wines | Barossa Valley      |
| Anderson and Sons Premium Wines | Coonawarra          |
| Anderson and Sons Premium Wines | Goulburn Valley     |
| Anderson and Sons Premium Wines | Lower Hunter Valley |
| Anderson and Sons Premium Wines | Margaret River      |
| Anderson and Sons Premium Wines | Riverland           |
| Anderson and Sons Premium Wines | Rutherglen          |
| Anderson and Sons Premium Wines | Swan Valley         |
| Anderson and Sons Premium Wines | Upper Hunter Valley |

The query produces all possible combinations of the 10 region names
and 300 wineries in the sample database! In fact, the number of rows
output is the total number of rows in the first table multiplied by
the total rows in the second table. In this case, the output is 10 x
300 = 3,000 rows.

5.6.2 Elementary Natural Joins

A cartesian product
isn't the join we want. Instead, we want to limit
the results to only the sensible rows, where the winery is actually
located in the region. To do this, you need to understand how the
relationship between the region and
winery tables is maintained. If you examine the
ER model, you'll see that many wineries are located
in a region.

In the database tables, the relationship between the
winery and region tables is
maintained using the primary key of the region
table, the attribute region_id
that's also an attribute in the
winery table. To understand this, consider the
first three rows from the winery table:

mysql> SELECT * FROM winery LIMIT 3;
+-----------+--------------------------+-----------+
| winery_id | winery_name              | region_id |
+-----------+--------------------------+-----------+
|         1 | Hanshaw Estates Winery   |         2 |
|         2 | De Morton and Sons Wines |         5 |
|         3 | Jones's Premium Wines    |         3 |
+-----------+--------------------------+-----------+
3 rows in set (0.04 sec)

The first winery has a region_id of 2, the second
a region_id of 5, and the third a
region_id of 3. Consider now the first five rows
of the region table:

mysql> SELECT * FROM region LIMIT 5;
+-----------+---------------------+
| region_id | region_name         |
+-----------+---------------------+
|         1 | All                 |
|         2 | Goulburn Valley     |
|         3 | Rutherglen          |
|         4 | Coonawarra          |
|         5 | Upper Hunter Valley |
+-----------+---------------------+
5 rows in set (0.04 sec)

If you match up each winery's
region_id value with a region's
region_id value, you can determine the
relationship and answer the query. For example, you can now see that
the first winery (Hanshaw Estates Winery) is located in region 2, the
Goulburn Valley.

From a querying perspective, we want to output
winery_name and region_name
values where the region_id in the
winery table matches the corresponding
region_id in the region
table. This is a natural join.

You can perform a natural join on the winery and
region tables using:

SELECT winery_name, region_name FROM winery NATURAL JOIN region
ORDER BY winery_name;

The query produces (in part) the following sensible results:

+---------------------------------+---------------------+
| winery_name                     | region_name         |
+---------------------------------+---------------------+
| Anderson and Sons Premium Wines | Coonawarra          |
| Anderson and Sons Wines         | Coonawarra          |
| Anderson Brothers Group         | Rutherglen          |
| Anderson Creek Group            | Riverland           |
| Anderson Daze Group             | Rutherglen          |
| Anderson Daze Vineyard          | Margaret River      |
| Anderson Daze Wines             | Barossa Valley      |
| Anderson Ridge Wines            | Lower Hunter Valley |

A natural join query relies on the DBMS matching attributes with the
same name across the two tables. In this example, MySQL discovers
that there's a region_id attribute in the
winery and region tables,
and it only outputs combinations where the
region_id in both
tables is the same.

You can write a join query that explicitly specifies which attributes
should be matched to produce the correct result. The following query
uses a WHERE clause to produce identical results
to our previous example:

SELECT winery_name, region_name FROM winery, region 
WHERE winery.region_id = region.region_id 
ORDER BY winery_name;

We recommend writing out your joins so that they include the join
condition in the WHERE clause. This is safer and
clearer than relying on the NATURAL JOIN operator
to discover common attribute names across tables and allowing the
DBMS to figure out how the join is done.

Several features are shown in this second example:

The FROM clause contains the two table names
winery and
region, and so retrieves rows from
both tables.

Attributes in the WHERE clause are specified using
both the table name and attribute name, separated by a period. This
is useful because the same attribute name is often used in different
tables, and the query can't figure out which table
is meant unless you include it. When an attribute name occurs in only
one table, you can omit the table name.

In this example, region_id in the
region table and region_id in
the winery table have to be specified
unambiguously as region.region_id and
winery.region_id. In contrast,
winery_name and region_name
don't need the table name because they occur only in
the winery and region
tables respectively.

The use of both the table and attribute name can also be used for
clarity in queries, even if it isn't required. So,
for example, you could write winery.winery_name in
the example query. It can also be used in all parts of the query, not
just the WHERE clause.

The WHERE clause includes a join clause that
matches rows between the multiple tables. In this example, the output
is reduced to those rows where wineries and regions have matching
region_id attributes, resulting in a list of all
wineries and which region they are located in. This is the key to
joining two or more tables to produce sensible results.

5.6.2.1 Examples

A join can be used to find lots of useful information from the
winestore database. Suppose we want to find the
names of wineries and the wines they make. Again, after examining the
ER model, you'll see that you need to join together
the related wine and winery
tables to get the required names. Here's the query
you'd need to write to get the correct result:

SELECT winery_name, wine_name FROM winery, wine 
WHERE wine.winery_id = winery.winery_id;

This query joins the winery and
wine tables by matching the
winery_id attributes. The result is the names and
wineries of the 1,048 wines stocked at the winestore.

You can extend this query to produce a list of wines made by a
specific winery or group of wineries. For example, to find all wines
made by wineries with a name beginning with Borg, use:

SELECT winery_name, wine_name FROM winery, wine 
WHERE wine.winery_id = winery.winery_id 
AND winery.winery_name LIKE 'Borg%';

The LIKE clause is discussed in detail in Chapter 15.

Here are two more example join queries:

To find the name of the region that the Ryan Ridge Winery is situated
in:

SELECT region_name FROM region, winery 
WHERE winery.region_id=region.region_id 
AND winery_name='Ryan Ridge Winery';

To find which wineries make Tonnibrook wines:

SELECT winery_name FROM winery, wine 
WHERE wine.winery_id=winery.winery_id 
AND wine_name='Tonnibrook';

5.6.2.2 Using DISTINCT in joins

The next example uses the
DISTINCT
operator to find wines that cost less
than $10:

SELECT DISTINCT wine.wine_id FROM wine, inventory 
WHERE wine.wine_id=inventory.wine_id AND cost<10;

Wines can have more than one inventory row, and the
DISTINCT operator shows each
wine_id once by removing any duplicates.

Here are two examples that use DISTINCT to show
only one matching answer:

To find which countries the customers live in:

SELECT DISTINCT country FROM customer, countries 
WHERE customer.country_id = countries.country_id;

To find which customers have ordered wines:

SELECT DISTINCT surname,firstname FROM customer,orders 
WHERE customer.cust_id = orders.cust_id
ORDER BY surname,firstname;

5.6.3 Joins with More than Two Tables

Queries can join more than two
tables. Suppose you want to find the details of the wine purchases
made by a customer, including the customer's
details, the dates they made an order, and the quantity and price of
the items purchased. You examine the ER model, and see that the
customer table that contains the customer
information is related to the orders table that
contains the date, and the orders table is
related to the items table that contains the
quantities and prices. So, to get the information you need, you have
to join all three tables together.

By examining the database structure or the CREATE TABLE statements, you can see that the
cust_id attribute can be used to join together the
customer and the orders
table. Joining the orders table and
items table is a little trickier: the primary
key of the orders table isn't
just the order_id, it's both the
cust_id and the order_id. So,
for example there are many rows with an order_id
of 1, but what makes a row unique is the combination of the
cust_id for a customer and the
order_id. These two attributes together are used
to join the orders and
items tables.

Suppose now that we want run this query for customer #2.
Here's the query you'd use:

SELECT * FROM customer, orders, items
WHERE customer.cust_id = orders.cust_id AND 
orders.order_id = items.order_id AND
orders.cust_id = items.cust_id AND customer.cust_id = 2;

The WHERE clause contains the join condition
between the three tables, customer,
orders, and items, and the
rows selected are those in which the cust_id is
the same for all three tables, the cust_id is 2,
and the order_id is the same in the
orders and items tables.
The example illustrates how frequently the Boolean operators
AND and OR are used.

If you remove the cust_id=2 clause, the query
outputs all items from all orders by all customers. This is a large
result set, but still a sensible one that is much smaller than the
cartesian product!

Here are two more examples that join three tables:

To find which wines are made in the Margaret River region:

SELECT wine_id FROM wine, winery, region
WHERE wine.winery_id=winery.winery_id AND
winery.region_id=region.region_id AND 
region.region_name='Margaret River';

To find which region contains the winery that makes wine #28:

SELECT region_name FROM wine, winery, region
WHERE wine.winery_id=winery.winery_id AND
winery.region_id=region.region_id AND
wine.wine_id=28;

Extending to four or more tables generalizes the approach further. To
find the details of customers who have purchased wines from the Ryan
Estates Group winery, use:

SELECT DISTINCT customer.cust_id, surname, firstname 
FROM customer, winery, wine, items
WHERE customer.cust_id=items.cust_id AND
items.wine_id=wine.wine_id AND
wine.winery_id=winery.winery_id AND
winery.winery_name='Ryan Estates Group'
ORDER BY surname, firstname;

This
query is the
most complex so far and has four parts. The easiest way to understand
a query is usually to start at the end of the
WHERE
clause and work toward the
SELECT clause:

The WHERE clause restricts the
winery rows to the Ryan Estates Group (which, in
this case, only matches one winery).

The resultant winery row is joined with the
wine table to find all wines made by the Ryan
Estates Group.

The wines made by Ryan Estates Group are joined with the
items that have been purchased by joining to the
items table.

The purchased wines are then joined with the
customer rows to find the purchasers. You can
leave out the orders table, because the
items table contains a
cust_id for the join; if you need the order number
or credit card number (or another orders
attribute), the orders table needs to be
included in the query.

The result is the details of customers who have purchased Ryan
Estates Group wines. The
DISTINCT
clause is used to show each customer
only once. ORDER BY sorts the
customer rows into telephone directory order.

Designing a query like this is a step-by-step process. We began by
testing a query to find the winery_id of wineries
with the name Ryan Estates Group. Then, after testing the query and
checking the result, we progressively added additional tables to the
FROM clause and the join conditions. Finally, we
added the ORDER
BY clause.

The next example uses three tables. It queries the complex
many-to-many relationship that exists between the
wines and grape_variety
tables via the wine_variety table. A wine can
have one or more grape varieties and these are listed in a specific
order (e.g., Cabernet, then Sauvignon). From the other perspective, a
grape variety such as Cabernet can be in hundreds of different wines.
The many-to-many relationship is managed by creating an intermediate
table between grape_variety and
wine called wine_variety.
The id attribute value stored in that table
represents the order in which the grape varieties should appear for
the wine. You can find a longer discussion of how these tables were
designed and how they're used in Appendix E.

Here is the example query that joins the three tables to find what
grape varieties are in wine #1004:

SELECT variety FROM grape_variety, wine_variety, wine
WHERE wine.wine_id=wine_variety.wine_id AND
wine_variety.variety_id=grape_variety.variety_id AND
wine.wine_id=1004
ORDER BY wine_variety.id;

The result of the query is:

+-----------+
| variety   |
+-----------+
| Cabernet  |
| Sauvignon |
+-----------+
2 rows in set (0.00 sec)

The join condition is the same as any three-table query. The only
significant difference is the ORDER
BY clause that presents the results in
id order (the first listed variety was stored with
ID=1, the second ID=2, and so
on).

Web Database Applications With Php And Mysql (2nd Edition) [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی