SQL Bible [Electronic resources] نسخه متنی

Outer Joins: Joining Tables on Columns Containing NULL Values

You probably noticed in the RESELLER table presented earlier in this chapter that the query returns all table records except one for ACME, INC. This is because the ACME, INC. record in the RESELLER table has NULL in the RESELLER_SUPPLIER_ID column, so an RDBMS cannot find the corresponding value in the table you are trying to join (in this case, the other instance of RESELLER table). As the result, the query returns nine rows even though the table contains ten records. That's just the way the standard (inner) join works. Sometimes, however, you want a query to return all rows from table A and the corresponding rows from table B — if they exist. That's where you use outer joins.

Two syntaxes for outer joins

Like inner joins, the outer joins also have two different syntaxes.

SQL99

The SQL99-compliant syntax indicates outer join in the FROM clause of the SELECT statement:

...
FROM  <table1>
{LEFT | RIGHT | FULL [OUTER]} | UNION JOIN
<table2>
[ON <condition>] | [USING <column_name>,...],...
...

The syntax is generally supported by all our three RDBMS vendors. The exceptions are the USING clause that is implemented only by Oracle and the UNION clause (discussed later in this chapter) that is not a part of any of the "big three" databases syntax.

Old syntax

The old syntax uses the WHERE clause and is different for Oracle and MS SQL Server. (DB2 does not have any "old" syntax for outer joins at all; it was using the SQL99-compliant syntax from the very beginning. In fact, the latter syntax has originated from DB2 standards.)

Oracle 9i

The old right or left outer join syntax is denoted by the plus operator (+) placed after the name of the table with no matching rows on the corresponding side of the = sign. The full outer join can be only specified with the SQL99-compliant syntax. The old Oracle syntax for right and left outer joins is shown below:

Syntax for right outer join

...
WHERE [<qualifier>.]<column_name> =
[<qualifier>.]<column_name> (+)
[AND  [<qualifier>.]<column_name> = 
[<qualifier>.]<column_name> (+)
],...
...

Syntax for left outer join

...
WHERE [<qualifier>.]<column_name> (+) = 
[<qualifier>.]<column_name>
[AND  [<qualifier>.]<column_name> (+) = 
[<qualifier>.]<column_name>
],...
...

MS SQL Server 2000

The right or left outer join is denoted by the asterisk (*) placed on the appropriate (right or left) side of the equals sign (=). The full outer join is unavailable in the old syntax. The following old syntax could be used in MS SQL Server for right and left outer joins:

Syntax for right outer join

...
WHERE [<qualifier>.]<column_name> =*
[<qualifier>.]<column_name>
[AND  [<qualifier>.]<column_name> =*
[<qualifier>.]<column_name>
],...
...

Syntax for left outer join

...
WHERE [<qualifier>.]<column_name> *= 
[<qualifier>.]<column_name>
[AND  [<qualifier>.]<column_name> *=
[<qualifier>.]<column_name>
],...
...

Left outer join

In fact, the term "left outer join" is just a convention used by SQL programmers. You can achieve identical results using left or right outer joins as we will demonstrate later in this chapter. The whole idea behind an outer join is to retrieve all rows from table A (left) or table B (right), even though there are no matching columns in the counterpart table, so the join column(s) is NULL. A left (or right) outer join also returns nulls for all unmatched columns from the joined table (for rows with NULL join columns only).

SQL99 standard syntax

The following query illustrates how to produce the resulting set containing all ten rows from RESELLER table using SQL99-compliant left outer join:

SELECT            r.reseller_id_n   AS res_id,  
r.reseller_name_s AS res_name, 
s.reseller_id_n   AS sup_id, 
s.reseller_name_s AS sup_name
FROM              reseller r 
LEFT OUTER JOIN 
reseller s
ON              r.reseller_supplier_id = s.reseller_id_n
RES_ID RES_NAME                     SUP_ID SUP_NAME
------ ---------------------------- ------ --------------------------
1 ACME, INC.                  NULL   NULL
2 MAGNETICS USA INC.          1      ACME, INC
3 MAGNETOMETRIC DEVICES INC.  1      ACME, INC
4 FAIR PARK GARDENS           2      MAGNETICS USA INC.
5 FAIR AND SONS AIR CONDTNG   2      MAGNETICS USA INC.
6 FABRITEK INC.               2      MAGNETICS USA INC.
7 WILE ELECTRONICS INC.       3      MAGNETOMETRIC DEVICES INC.
8 INTEREX USA                 3      MAGNETOMETRIC DEVICES INC.
9 JUDCO MANUFACTURING INC.    4      FAIR PARK GARDENS
10 ELECTRO BASS INC.           5      FAIR AND SONS AIR CONDTNG
(10 rows affected)

One more example. Assume we need to retrieve customer name and all order numbers for customer 152. The following (inner) join will do:

SELECT  cust_name_s,
ordhdr_nbr_s
FROM    customer
JOIN
order_header
ON      cust_id_n = ordhdr_custid_fn
WHERE   cust_id_n = 152
CUST_NAME_S        ORDHDR_NBR_S                                                                    
------------------ ------------                                                  
...                ...
WILE BESS COMPANY  523731                                                                          
WILE BESS COMPANY  523732                                                                          
...                ...                                                                                
31 rows selected.

Now we need very similar results except that we also want corresponding payment terms for each order. We assume we could simply modify our query by joining PAYMENT_TERMS table to it:

SELECT  cust_name_s, 
ordhdr_nbr_s, 
payterms_desc_s
FROM    customer
JOIN
order_header
ON    cust_id_n = ordhdr_custid_fn
JOIN
payment_terms
ON    payterms_id_n = ordhdr_payterms_fn
WHERE   cust_id_n = 152;
CUST_NAME_S        ORDHDR_NBR_S PAYTERMS_DESC_S                                                                 
------------------ ------------ ---------------
...                ...          ...
WILE BESS COMPANY  523732       2% 15 NET 30                                                                    
...                ...          ...
30 rows selected.

To our surprise, the query now returns thirty rows instead of thirty one. The reason is order 523731 for WILE BESS COMPANY has NULL in the ORDHDR_PAYTERMS_FN column, so the row is completely excluded from the resultset if we use inner join. The solution is to use outer join:

SELECT  cust_name_s, 
ordhdr_nbr_s, 
payterms_desc_s
FROM    customer
JOIN
order_header
ON    cust_id_n = ordhdr_custid_fn
LEFT OUTER JOIN
payment_terms
ON    payterms_id_n = ordhdr_payterms_fn
WHERE   cust_id_n = 152;
CUST_NAME_S        ORDHDR_NBR_S PAYTERMS_DESC_S                                                                 
------------------ ------------ ---------------
...                ...          ...
WILE BESS COMPANY  523731       NULL                                                                    
WILE BESS COMPANY  523732       2% 15 NET 30                                                                    
...                ...          ...
31 rows selected.

Old syntax

The old syntax for outer joins varies from vendor to vendor.

Oracle 9i

Oracle did not become compliant with SQL99 syntax for outer joins until version 9i. If you use an earlier version of Oracle, an outer join would be announced by the plus sign enclosed by parentheses, (+), placed after the table name that does not have matching rows. The query producing results identical to the previous example would be as follows:

SELECT  cust_name_s, 
ordhdr_nbr_s, 
payterms_desc_s  
FROM    customer,
order_header,
payment_terms
WHERE   cust_id_n = ordhdr_custid_fn   
AND     ordhdr_payterms_fn = payterms_id_n (+) 
AND     cust_id_n = 152

The confusion is compounded by the fact that in Oracle the join is usually called a "right outer join" — because the (+) sign is on the right side of the = sign.

DB2 UDB 8.1

DB2 uses only the standard SQL99 syntax for left outer join.

MS SQL Server 2000

The old MS SQL Server syntax for left outer join is to put an asterisk on the left side of the equals sign in the WHERE clause. The left outer join that produces results identical to those with SQL99 syntax is

SELECT  cust_name_s, 
ordhdr_nbr_s, 
payterms_desc_s  
FROM    customer,
order_header,
payment_terms
WHERE   cust_id_n = ordhdr_custid_fn   
AND     ordhdr_payterms_fn *= payterms_id_n 
AND     cust_id_n = 152

Right outer join

As we mentioned before, the only difference between left and right outer joins is the order in which the tables are joined in the query. To demonstrate that we'll use queries that produce exactly same output as in the previous section.

SQL99 standard syntax

As you can see, the resulting set of the inner join of ORDER_HEADER and CUSTOMER is on the right-hand side from the PAYMENT_TERMS table:

SELECT  cust_name_s, 
ordhdr_nbr_s, 
payterms_desc_s  
FROM    payment_terms 
RIGHT OUTER JOIN 
order_header
ON    payterms_id_n = ordhdr_payterms_fn 
JOIN    
customer
ON      cust_id_n = ordhdr_custid_fn
WHERE   cust_id_n = 152
CUST_NAME_S        ORDHDR_NBR_S                                                                    
------------------ ------------                                                  
...                ...
WILE BESS COMPANY  523731                                                                          
WILE BESS COMPANY  523732                                                                          
...                ...                                                                                
31 rows selected.

Old syntax

We already explained the old syntax for left outer join in previous section; the syntax for right outer join is very similar.

Oracle 9i

As we mentioned in the section about left outer join, the old definition of left and right outer joins in Oracle is vague. The equivalent to the above query using the old syntax is

SELECT  cust_name_s, 
ordhdr_nbr_s, 
payterms_desc_s  
FROM    customer,
order_header,
payment_terms
WHERE   cust_id_n = ordhdr_custid_fn   
AND     payterms_id_n (+) = ordhdr_payterms_fn
AND     cust_id_n = 152

The (+) sign has moved to the left along with the column name from the PAYMENT_TERMS table (PAYTERMS_ID_N) that does not have matching rows. The output is identical to what all other "identical" queries produce; this type of join is called "left outer join" in Oracle.

Note

Many Oracle users are confused by the "new" SQL99 outer join syntax. The problem is, they used to call an outer join "left" or "right" depending on what side of the equals sign, =, the outer join sign, (+), was located. The "new" syntax takes a different approach — the term "left" or "right" identifies the relational position of the table from which you want to retrieve all rows, no matter if the other table that participates in the join operation has matching rows or not.

DB2 UDB 8.1

DB2 uses only the standard SQL99 syntax for right outer join.

MS SQL Server 2000

The old MS SQL Server syntax for right outer join is to put an asterisk on the right side of the equal sign in the WHERE clause. The right outer join that produces results identical to ones from SQL99 syntax is

SELECT  cust_name_s, 
ordhdr_nbr_s, 
payterms_desc_s  
FROM    customer,
order_header,
payment_terms
WHERE   cust_id_n = ordhdr_custid_fn   
AND     payterms_id_n =* ordhdr_payterms_fn
AND     cust_id_n = 152

Full outer join

Full outer join is the combination of left and right outer join. It returns all rows from both "left" and "right" tables, no matter if the counterpart table has matching rows or not. For example, in the ACME database there are some customers that did not place any orders yet — as well as some orders with no customers assigned to them.

Note

An order without a customer may sound unusual, but think of a situation in which customers A and B make very similar orders on a weekly basis. Customer service personnel create a few "barebones" orders when they have spare time and leave the customer number field blank (as well as some other fields), so when a customer actually calls to place the order, it takes less time to process the request.

The query that retrieves all customers without orders as well as all orders with no customer assigned to them is shown below:

SELECT  customer.cust_name_s, 
order_header.ordhdr_nbr_s
FROM    customer 
FULL OUTER JOIN 
order_header 
ON customer.cust_id_n = order_header.ordhdr_custid_fn
CUST_NAME_S                                 ORDHDR_NBR_S
----------------------------------------    -------------
...
WILE SEAL CORP.                                    523774
WILE SEAL CORP.                                    523775
WILE SEAL CORP.                                    523776
WILE SEAL CORP.                                    523777
WILE SEAL CORP.                                    523778
...                                                ...
WILE BESS COMPANY                                  523730
NULL                                               523727
NULL                                               523728
MAGNETICS USA INC.                                 NULL
MAGNETOMETRIC DEVICES INC.                         NULL
FAIR PARK GARDENS                                  NULL
...
83 record(s) selected.

The above syntax is SQL99-compliant and is the only one available for full outer join in all our "big three" databases. It would be logical to suggest something like (+) = (+) for Oracle and *=* for MS SQL Server, but these are not valid.

Union join

The UNION join (not to be confused with the UNION operator) could be thought of as the opposite of an inner join — its resulting set only includes those rows from both joined tables for which no matches were found; the columns from the table without matching rows are populated with nulls.

Cross-References

Applying set theory basics, covered in Appendix L, you could say that A UNION JOIN B = (A FULL OUTER JOIN B) DIFFERENCE (A INNER JOIN B).