Oracle SQLPlus [Electronic resources] : The Definitive Guide, 2nd Edition نسخه متنی

4.4 Table Joins

It's often the case that the data you want to return
from a query are spread across more than one table. Commonly this
occurs as a result of master-detail relationships such as the one
between the project and
project_hours tables. The
project table contains one row describing each
project, whereas, for each project, project_hours
contains many rows detailing time charged to that project. To link
such master and detail data together in the results from a SELECT
statement, you use a relational operation called a
join .

4.4.1 Inner Joins

Imagine that you need to generate a detailed listing of time charged
to each of your projects. You want that listing to include project
name, the date on which work was performed, and the number of hours
involved. What makes this request interesting is that you need to
list two tables in your FROM clause. Project names come from the
project table, while dates and hours come from the
project_hours table. How do you SELECT from two
tables at once? The answer is to use a JOIN clause such as the one in
Example 4-25.

Example 4-25. Selecting related rows from two tables

SELECT p.project_name, ph. employee_id,
ph.time_log_date, ph.hours_logged
FROM project p JOIN project_hours ph
ON p.project_id = ph.project_id;

The JOIN clause, which is a subclause of the FROM clause, is new in
Oracle9 i Database and specifies that data are to
be drawn from not one table, but from two, in this case from the
project and project_hours
tables. The ON clause, subsidiary to JOIN, defines the conditions
that rows of combined data must meet to be returned from the join
operation. This will make more sense as you scrutinize Figures Figure 4-2 through Figure 4-4.

Conceptually, all joins begin as a set of all possible combinations
of rows from the two tables involved. Such a set is known as a
Cartesian product . Figure 4-2
shows the beginning of a join between a three-row
project table and a four-row
project_hours table.

Databases implement all kinds of optimizations to avoid generating
Cartesian products. However, understanding the conceptual process
defining the join operation is critical to writing accurate SQL
queries.

Each row from project has been combined with each
row from project_hours. I've
highlighted the JOIN clause in the query to show how much of the
query has been executed so far.

Looking at Figure 4-2, you can see that not all
combinations make sense. Detail from project 1004 has no business
being associated with project 1002, nor does the reverse make sense.
Project 1003 should have no detail at all. This is where the
join conditions in the ON clause come into play.
When you write a join, you should write join conditions to identify
those rows in the Cartesian product that make sense, that you wish
returned from the join operation. Rows not matching the join
conditions are filtered out. Think of a sieve filtering out large
gravel and passing only the sand. Figure 4-3 shows
the results of evaluating the join condition in the example query.

Figure 4-2. All joins begin as a Cartesian product

Figure 4-3. Join conditions pass only those row combinations that make sense, eliminating all the others

As Figure 4-3 illustrates, the next step after
forming the Cartesian product is to apply the conditions in the ON
clause, using those to eliminate row combinations that make no
business sense. Look at the project ID numbers in Figure 4-3s results. Each row output
from the join operation corresponds to a row from
project_hours. Each row contains corresponding
project information (name and ID) from project. At
this intermediate stage, you can think of all columns from both
tables as being present in each row. Thus, each row has two copies of
the project ID number, one from each table, and those two values are
used (see the ON clause) to identify those row combinations that make
business sense.

The join illustrated in Figure 4-3 is known as an
inner join . In fact, you can write it using the
keywords INNER JOIN instead of just JOIN. The key characteristic of
an inner join is that each row of output from a join operation
contains values from both tables. The choice of the term
inner join is unfortunate because that term is
not at all evocative of the results produced.

It's entirely possible to place WHERE conditions in
a query containing a join and this is often done. Such WHERE
conditions are evaluated after all the joins.
Figure 4-4 shows the same query as Figure 4-3 but with the addition of a WHERE clause. You
can see the effect of WHERE and when that clause is evaluated. You
can see that the generation of columns in the SELECT list is the
final operation in the query's execution.

Figure 4-4. WHERE clause conditions are evaluated following all joins

The process I've described in this section is
conceptual. It's a good way to think about joins
that will help you write better queries and to write them more
easily. However, this conceptual process is seldom efficient when
applied in real life. Instead, databases implement many techniques to
optimize joins and query executions. For example, with respect to the
WHERE clause in Figure 4-4, a
database's query optimizer may
"decide" to evaluate that condition
early, eliminating rows from project_hours for
February before the join, not after it.

Don't get caught up in thinking about query
optimizations when writing join queries. Think conceptually until
you've written a query that yields the correct
results. No matter how a database optimizes a query, especially a
join query, in the end the results must match
the conceptual model I've just described. All joins
begin with a Cartesian product of all possible combinations of rows
from two tables. Join conditions then eliminate unwanted rows.
It's that simple.

If you're interested in the topic of join
optimization, particularly for queries containing large numbers of
joins, I heartily recommend and endorse Dan Tow's
book, SQL Tuning
(O'Reilly).

4.4.2 Outer Joins

Go back to Figure 4-3 and look at the rows feeding
into and out of the ON clause evaluation. What happened to project
1003? Rows referencing that project existed in the
project table, were part of the Cartesian product,
but were completely eliminated by the ON clause. Why? Because the
project rows for 1003 had no counterparts in the
project_hours table. As a result, no rows were in
the Cartesian product for project 1003 in which
both project_id values
matched, and thus, project 1003 completely vanished from the
query's result set. This behavior is not always what
you want.

To join project to
project_hours using the data shown in Figure 4-3, yet preserving projects having no
corresponding detail in project_hours, you need to
use an outer join . An outer join designates one
table as optional and the other as an anchor. Rows from the anchor
table in an outer join are preserved even when no corresponding
detail rows exist.

Figure 4-5 shows a left outer
join , designating the table on the left-hand side of the
JOIN keyword as the anchor table. The project
table is to the left, so project rows are
preserved.

Figure 4-5. An outer join preserving a row with no detail

Look carefully at what goes on in Figure 4-5. The
condition in the ON clause results in the elimination of all rows in
the Cartesian product associated with the row from the
project table for project 1003. This is when the
outer join logic kicks in. The four combination rows for project 1003
are, in fact, removed from the join, but a new row is slipped in for
project 1003 containing values only from the
project table. Column values that would otherwise
come from project_hours are set to null.

Remember that my explanation is conceptual, a useful way of
visualizing the outer join process. How the database engine
implements this conceptual process may be quite different from what
I've described.

Another type of join is the right outer join .
Right and left outer joins are the same fundamental operation. The
only difference lies in which side of the JOIN keyword you place the
anchor table. Example 4-26 shows semantically
identical left and right outer joins.

Example 4-26. Left and right outer joins are fundamentally the same operation

SELECT p.project_name, ph.time_log_date, ph.hours_logged
FROM project p LEFT OUTER JOIN project_hours ph
ON p.project_id = ph.project_id;
SELECT p.project_name, ph.time_log_date, ph.hours_logged
FROM project_hours ph RIGHT OUTER JOIN project p
ON p.project_id = ph.project_id;

Example 4-26s first query is a left
outer join: project is to the left, so
project is the anchor table. The second query is a
right outer join, but I've moved
project to the right-hand side of the JOIN
keyword, so it is still the anchor table. Both queries generate the
same result.

Old Join Syntax

The join syntax I show in this chapter is that introduced in the 1992
ANSI/ISO SQL standard. You'll frequently encounter
an older syntax in which JOIN and ON clauses were not used and which
placed join conditions in the WHERE clause. For example, the
following two inner joins are semantically equivalent:

SELECT p.project_name, ph.time_log_date, ph.hours_logged
FROM project p JOIN project_hours ph
ON p.project_id = ph.project_id;
SELECT p.project_name, ph.time_log_date, ph.hours_logged
FROM project p, project_hours ph
WHERE p.project_id = ph.project_id;

In the second join, a comma delimits the table names in the FROM
clause. That comma notation generates the Cartesian product. The
conditions in the WHERE clause reduce that Cartesian product
(assuming you've written your conditions properly)
to a set of rows that makes business sense.

Oracle supports an older outer join syntax in which the optional
table is designated by placing a (+) following
each reference to a column from that table in the WHERE clause:

SELECT p.project_name, ph.time_log_date, ph.hours_logged
FROM project p, project_hours ph
WHERE p.project_id = ph.project_id(+);

You'll see these older forms of join often,
especially in older code. Avoid them in new code that you write. The
newer notation is much clearer, and it almost eliminates the
possibility of accidentally returning the entire Cartesian product as
the result from a join.

4.4.3 Full Outer Joins

The final type of outer join that I want to discuss is the
full outer join . Think of a full outer join as
the combination of all three types of joins so far: inner, left, and
right:

You get all the rows that you would get from an INNER JOIN (e.g.,
projects and their attendant detail):

project p INNER JOIN project_hours ph
Plus you get the additional rows that would
result from making that inner join into a LEFT OUTER JOIN (e.g.,
projects without any detail):

project p LEFT OUTER JOIN project_hours ph
Plus you further get those additional rows that would result from
RIGHT OUTER JOIN, but keeping the same ordering of the tables (e.g.,
detail without any projects):

project p RIGHT OUTER JOIN project_hours ph
A full outer join will return at least one of each row from both
tables. You would typically then have WHERE clause conditions to
further filter the results.

Oracle Database 10 g introduced yet another type
of join, the partition outer join . Read about
that at http://gennick.com/partitionl.

Oracle SQLPlus [Electronic resources] : The Definitive Guide, 2nd Edition نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی