Oracle SQLPlus [Electronic resources] : The Definitive Guide, 2nd Edition نسخه متنی

4.7 Subqueries

A subquery is a SELECT statement embedded within
another SQL statement. You can use subqueries in place of column
references, in place of table references, to generate values for
comparison in a WHERE or HAVING condition, and to generate values for
inserts and updates.

4.7.1 Treating Rowsets as Tables

So far in this chapter, you've selected data from
tables. A SELECT statement is executed, and a set of rows comes back
as the result. Imagine if you could further treat that set of rows as
a table against which you issue another SELECT statement or perhaps
an UPDATE or a DELETE. Using a subquery, you can do just what
I've described, and that can lead to some
interesting and elegant solutions to SQL problems.

One use for a subquery in the FROM clause is to aggregate data that
has already been aggregated. For example, you might be faced with the
following business problem:

Find all employees who have worked on projects 1001 and 1002 during
the year 2004. Sum the number of hours each of those employees has
worked on each project during that year. Report the ranges. Show the
lowest number of hours that any employee worked on each project
during 2004, as well as the highest number of hours.

This is an interesting problem to solve because you must sum the
hours for employee/project combinations using the SUM aggregate
function, and you must apply MIN and MAX to your sums. Example 4-32 shows one way to approach this problem using a
subquery.

Example 4-32. Aggregating aggregated data

SELECT phs.project_id, MIN(phs.hours), MAX(phs.hours)
FROM (SELECT employee_id, project_id, SUM(hours_logged) hours
FROM project_hours
WHERE project_id IN (1001, 1002)
AND TRUNC(time_log_date,'Year') = TO_DATE('1-Jan-2004',
'dd-Mon-yyyy')
GROUP BY employee_id, project_id) phs
GROUP BY project_id;
PROJECT_ID MIN(PHS.HOURS) MAX(PHS.HOURS)
---------- -------------- --------------
1001 4 20
1002 8 24 The subquery appears in the FROM clause enclosed in parentheses. The
outer query treats the rows from the subquery in the same manner as
it would treat rows from a table. Tables have names, and so should
subqueries. Example 4-32 gives the name, or alias,
phs to the subquery. The outer query uses that
name to refer to the columns from the subquery. The alias
hours is given to the column represented by
SUM(hours_logged), making it easy to refer to that
column from the outer query.

The call to TRUNC(time_log_date,'Year') converts
each date to January 1^st of its year. Any
date in 2004 will be truncated to 1-Jan-2004. It's
easy, then, to eliminate time log entries for other years.

Because subqueries in the FROM clause are treated the same as tables,
it stands to reason they can take part in joins. Example 4-33 expands on Example 4-32 by
adding another level of subquery and a join to the
project table in order to include project names in
the query output.

Example 4-33. Joining a subquery to a table

SELECT minmax_hours.project_id, p.project_name,
minmax_hours.min_hours, minmax_hours.max_hours
FROM (
SELECT phs.project_id, MIN(phs.hours) min_hours, MAX(phs.hours)
max_hours
FROM (SELECT employee_id, project_id, SUM(hours_logged) hours
FROM project_hours
WHERE project_id IN (1001, 1002)
AND TRUNC(time_log_date,'Year') = TO_DATE('1-Jan-2004',
'dd-Mon-yyyy')
GROUP BY employee_id, project_id) phs
GROUP BY project_id) minmax_hours JOIN project p
ON minmax_hours.project_id = p.project_id;
PROJECT_ID PROJECT_NAME MIN_HOURS MAX_HOURS
---------- ---------------------------------------- ---------- ----------
1001 Corporate Web Site 4 20
1002 Enterprise Resource Planning System 8 24 There are different ways you can approach the join to
project shown in Example 4-33. The
join could occur in the innermost subquery or in the middle subquery,
and either of those alternatives would eliminate the need for a third
subquery. Sometimes it's a judgment call as to which
approach is best. Sometimes you need to make that call based on
readability or on a desire not to mess with a working query. By
adding an outer SELECT statement to an already working query, I
avoided the need to tamper with a SELECT statement, the middle one in
Example 4-33, which I knew worked.

There's a performance issue, too, that
isn't obvious in the amount of sample data used for
the examples in this book. Joining to project in
the innermost query would force the join to take place before any
aggregation at all. Given a large enough number of rows in
project_hours, a join that early would
significantly increase the expenditure of I/O and CPU resources by
the query, as all those detail rows would need to be joined. Having
the join occur where it does in Example 4-33 means
that only two rows, the two returned by the middle query, need to be
joined to project.

Subqueries in the FROM clause are sometimes referred to as
inline views , and such a subquery can be
considered as a dynamically created view, for the purpose of the one
query. Any subquery in the FROM clause can be replaced by an
equivalent view, but then you have the problem of creating that view,
which is a permanent database object.

You can see another interesting use of inline views, this time to
generate rows of data that don't already exist in
the database, in the article Creating Pivot
Tables at http://gennick.com/pivotl.

4.7.2 Testing for Representation

You can use subqueries to see whether a row is representative of a
set. The query in Example 4-32 contains the following
WHERE condition:

WHERE project_id IN (1001, 1002) This condition tests whether a row from the
project_hours table is associated with project
1001 or 1002. Imagine a more complicated scenario. Imagine that
you're interested in all projects having budgets of
$1,000,000 or more. You don't know which projects
have such large budgets. Furthermore, budgets change from time to
time, and you don't want to have to modify the
project ID numbers in your query each time your budgets change.
Instead of hard-coding the project_id list for the
IN predicate, you can generate that list using a subquery in the IN
predicate, as shown in Example 4-34.

Example 4-34. A subquery generating values for an IN predicate

SELECT phs.project_id, MIN(phs.hours), MAX(phs.hours)
FROM (SELECT employee_id, project_id, SUM(hours_logged) hours
FROM project_hours
WHERE project_id IN (SELECT project_id
FROM project
WHERE project_budget >= 1000000)
AND TRUNC(time_log_date,'Year') = TO_DATE('1-Jan-2004','dd-Mon-yyyy')
GROUP BY employee_id, project_id) phs
GROUP BY project_id;

When you execute the SELECT statement shown in Example 4-34, the IN predicate's subquery
is executed first to generate a list of project IDs encompassing all
projects having budgets of $1,000,000 or more. Another approach to
this problem is to issue a subquery for each
project_hour row to see whether the associated
project has the required budget. This approach is shown in Example 4-35.

Example 4-35. An EXISTS subquery checking for projects with large budgets

SELECT phs.project_id, MIN(phs.hours), MAX(phs.hours)
FROM (SELECT employee_id, project_id, SUM(hours_logged) hours
FROM project_hours ph
WHERE EXISTS (SELECT *
FROM project pb
WHERE pb.project_budget >= 1000000
AND pb.project_id = ph.project_id)
AND TRUNC(time_log_date,'Year') = TO_DATE('1-Jan-2004','dd-Mon-yyyy')
GROUP BY employee_id, project_id) phs
GROUP BY project_id;

The subquery in Example 4-35 is known as a
correlated subquery , meaning that the subquery
references a value from the parent statement. Aliases are important
when writing such subqueries. The parent query's
project_hours table is aliased as
ph. The subquery references the project ID from
each candidate row from the main query via the reference to
ph.project_id. The subquery is executed once for
each row that can potentially be returned by the main query.

Subqueries prior to Example 4-35 have all been
non-correlated , which means that you can execute
them independently of their enclosing statement. Non-correlated
subqueries are executed just once per execution of their containing
query. Correlated subqueries are executed once per row. Either
subquery may be written using IN or EXISTS, but typically
you'll find that IN subqueries are non-correlated
and EXISTS subqueries are correlated.

It's pretty much impossible to provide a general
rule as to when to use IN and EXISTS predicates when either can be
used to solve a particular problem. Given the low number of rows in
the project table as compared to the
project_hours table, you might think that the
SELECT in Example 4-34 would be more efficient than
the one in Example 4-35. After all, the IN
predicate's subquery must execute only once, and
return only two values. However, in my database, with the release of
Oracle that I'm running and the sample data
I'm using for this book, the SELECT in Example 4-35 requires far less input/output than Example 4-34. A good practice is to test both approaches,
perhaps using SET AUTOTRACE ON as described in Chapter 12, and then use the one that performs best
for your particular query.

Nulls in NOT IN Predicates

When writing NOT IN predicates, be sensitive to the possible presence
of nulls in the results of your subqueries. The following query
attempts to use NOT IN to return a list of all employees terminated
on any date on which at least one employee hired in 2004 was
terminated:

SELECT employee_id, employee_termination_date
FROM employee
WHERE employee_termination_date NOT IN (
SELECT employee_termination_date
FROM employee
WHERE '2004' =
TO_CHAR(
TRUNC(employee_hire_date,'year'), 'yyyy')
);

Executed against the sample data for this book, this query will
return no rows because the subquery returns one or more nulls. A
single null prevents the NOT IN condition from ever returning true
because the null is treated as an unknown. Is a given termination
date not in the set? The answer is unknown because
there's at least one unknown value in the set. Be
very careful of this scenario when writing NOT IN subqueries. Take
care that such subqueries do not return nulls.

4.7.3 Generating Data for INSERTs and UPDATEs

Subqueries are useful in generating values for INSERT and UPDATE
statements. Examples Example 4-36 and Example 4-37 show two ways of creating a reporting table of
project billing data summarized by week. You might generate, and
periodically refresh, such a table to make it easier for end users to
check on project status using ad hoc query tools.

Example 4-36 uses a CREATE TABLE AS SELECT FROM
statement, which lets you create and populate the reporting table in
one step. Example 4-37 creates the reporting table
separately, and then populates the table using an INSERT...SELECT
FROM statement. In each case, a subquery generates the data to be
inserted into the new table.

Example 4-36. Using CREATE TABLE AS SELECT FROM

DROP TABLE project_time;
CREATE TABLE project_time AS
SELECT EXTRACT (YEAR FROM time_log_date) year,
TO_NUMBER(TO_CHAR(time_log_date,'ww')) week_number,
p.project_id, p.project_name, SUM(ph.hours_logged) hours_logged
FROM project p JOIN project_hours ph
ON p.project_id = ph.project_id
GROUP BY EXTRACT (YEAR FROM time_log_date),
TO_NUMBER(TO_CHAR(time_log_date,'ww')),
p.project_id, p.project_name;

Example 4-37. Using INSERT...SELECT FROM

DROP TABLE project_time;
CREATE TABLE project_time (
year NUMBER,
week_number NUMBER,
project_id NUMBER(4),
project_name VARCHAR2(40),
hours_logged NUMBER);
INSERT INTO project_time
SELECT EXTRACT (YEAR FROM time_log_date),
TO_NUMBER(TO_CHAR(time_log_date,'ww')),
p.project_id, p.project_name, SUM(ph.hours_logged)
FROM project p JOIN project_hours ph
ON p.project_id = ph.project_id
GROUP BY EXTRACT (YEAR FROM time_log_date),
TO_NUMBER(TO_CHAR(time_log_date,'ww')),
p.project_id, p.project_name;

Having created the project_time table with its
redundant project_name column,
it's a given that someone, someday, will come along
and change a project name on you, leaving you to sort out the
resulting mess by somehow propagating the new project name to all the
summary rows in project_time. That kind of update
is easily done using a subquery in the SET clause of an UPDATE
statement, as in Example 4-38.

Example 4-38. A subquery generating a value for an UPDATE

UPDATE project_time pt
SET pt.project_name = (SELECT p.project_name
FROM project p
WHERE p.project_id = pt.project_id)
WHERE pt.project_name <> (SELECT p.project_name
FROM project p
WHERE p.project_id = pt.project_id) The UPDATE in Example 4-38 reads each row in the
project_time table and updates those
project_time rows subject to name changes.

Oracle SQLPlus [Electronic resources] : The Definitive Guide, 2nd Edition نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی