5.5 Querying with SQL SELECT
The
SELECT
statement is used to query and
retrieve one or more rows from a database. We introduce it in this
section, and then show you the WHERE clause for
selecting data that matches a condition. The section concludes with
an introduction to the more advanced features of
SELECT statements and a short case study.
5.5.1 Basic Querying
Consider an example SELECT statement:
SELECT surname, firstname FROM customer;This outputs the values of the attributes surname
and firstname from all rows in the
customer table. Assuming we previously inserted
four rows when we created the winestore
database, the output from the MySQL command interpreter is:
+-----------+-----------+Any attributes of a table may be listed in a
| surname | firstname |
+-----------+-----------+
| Marzalla | Dimitria |
| LaTrobe | Anthony |
| Fong | Nicholas |
| Stribling | James |
+-----------+-----------+
4 rows in set (0.04 sec)
SELECT statement by separating them with a comma.
If all attributes are required, the shortcut of an asterisk character
(*) can be used. Consider the statement:
SELECT * FROM region;This outputs all the data from the table region:
+-----------+---------------------+SELECT statements can also output data that
| region_id | region_name |
+-----------+---------------------+
| 1 | All |
| 2 | Goulburn Valley |
| 3 | Rutherglen |
| 4 | Coonawarra |
| 5 | Upper Hunter Valley |
| 6 | Lower Hunter Valley |
| 7 | Barossa Valley |
| 8 | Riverland |
| 9 | Margaret River |
| 10 | Swan Valley |
+-----------+---------------------+
10 rows in set (0.01 sec)
isn't from a database. Consider the following
example:
SELECT curtime( );This example runs a function that displays the current time:
+-----------+The SELECT statement can even be used as a simple
| curtime( ) |
+-----------+
| 08:41:50 |
+-----------+
1 row in set (0.02 sec)
calculator, using the MySQL mathematical functions described in Chapter 15:
SELECT pi( )*(4*4);This outputs:
+------------+
| pi( )*(4*4) |
+------------+
| 50.265482 |
+------------+
1 row in set (0.01 sec)
5.5.2 WHERE Clauses
A
WHERE clause is used as part of most
SELECT queries to limit the rows that are
retrieved to those that match a condition.Consider this grape-growing region table
containing the details of ten regions:
mysql> SELECT * from region;To show only the first three regions, you can type:
+-----------+---------------------+
| region_id | region_name |
+-----------+---------------------+
| 1 | All |
| 2 | Goulburn Valley |
| 3 | Rutherglen |
| 4 | Coonawarra |
| 5 | Upper Hunter Valley |
| 6 | Lower Hunter Valley |
| 7 | Barossa Valley |
| 8 | Riverland |
| 9 | Margaret River |
| 10 | Swan Valley |
+-----------+---------------------+
10 rows in set (0.09 sec)
SELECT * FROM region WHERE region_id <= 3;This outputs all attributes for the first three rows:
+-----------+-----------------+You can combine the attribute and row restrictions and select only
| region_id | region_name |
+-----------+-----------------+
| 1 | All |
| 2 | Goulburn Valley |
| 3 | Rutherglen |
+-----------+-----------------+
3 rows in set (0.03 sec)
the region_name attribute for the first three
regions:
mysql> SELECT region_name FROM region WHERE region_id <= 3;The SQL Boolean operators AND and
+-----------------+
| region_name |
+-----------------+
| All |
| Goulburn Valley |
| Rutherglen |
+-----------------+
3 rows in set (0.01 sec)
OR have the same function as the PHP
&& and || operators
introduced in Chapter 2. These can be used to
develop more complex WHERE clauses (and these can
be combined with the MySQL functions described in Chapter 15). Consider an example query:
SELECT * FROM customer WHERE surname='Marzalla' AND firstname='Dimitria';This retrieves rows that match both criteria, that is, those
customers with a surname Marzalla and a
firstname Dimitria. In this example, you need to
be careful to type the strings 'Marzalla' and
'Dimitria' using the correct case because string
values are case sensitive.Consider a more complex example:
SELECT cust_id FROM customerThis finds rows with either the surname Marzalla
WHERE (surname='Marzalla' AND firstname LIKE 'M%') OR
birth_date='1980-07-14';
and a firstname beginning with M, or customers who
were born on 14 July 1980; the
LIKE operator is
discussed in more detail in Chapter 15. The
OR operator isn't exclusive, so a
row can contain a birth date of 14 July 1980, a
surname of Marzalla, and a
firstname beginning with M. This query, when run
on the winestore database, returns:
+---------+SELECT queries are often sophisticated and a long
| cust_id |
+---------+
| 440 |
| 493 |
+---------+
2 rows in set (0.01 sec)
WHERE clause may include many
AND and OR operators. More
complex examples of queries are shown later in this chapter. As
discussed previously, the WHERE clause is also a
common component of UPDATE and
DELETE statements.
5.5.3 Sorting and Grouping Output
Listing attributes in the
SELECT statement and using
WHERE allows you to decide what rows and columns
in a table are returned from a query. However, you might also want to
sort the data after it's returned, or you might want
to group it together beforehand so that you can count the number of
rows with different values, find a minimum or maximum value, or sum a
numeric field. This section shows you how to pre- and post-process
your data.
5.5.3.1 ORDER BY
The ORDER BY clause
sorts the data after the query has been evaluated. Consider an
example:
SELECT surname, firstname FROM customerThis query finds all customers who live in Portsea and who have the
WHERE city = 'Portsea' and firstname = 'James' ORDER by surname;
first name James. It then presents the results sorted alphabetically
by ascending surname:
+-----------+-----------+Sorting can be on multiple attributes. For example:
| surname | firstname |
+-----------+-----------+
| Leramonth | James |
| Mockridge | James |
| Ritterman | James |
+-----------+-----------+
3 rows in set (0.00 sec)
SELECT surname, firstname, initial FROM customerThis presents a list of customers who live in Coonawarra or Longwood,
WHERE city = 'Coonawarra' OR city = 'Longwood'
ORDER BY surname, firstname, initial;
sorted first by ascending surname, then (for those
customers with the same surname) by firstname, and
(for those customers with the same surname and first name), by
initial. The output for the winestore
customer table is:
+------------+-----------+---------+By default, the ORDER BY clause
| surname | firstname | initial |
+------------+-----------+---------+
| Archibald | Belinda | Q |
| Chester | Marie | S |
| Dalion | Marie | C |
| Eggelston | Martin | E |
| Florenini | Melinda | O |
| Holdenson | Jasmine | F |
| Mellaseca | Craig | Y |
| Mockridge | Dimitria | I |
| Morfooney | Chris | K |
| Nancarral | Samantha | W |
| Oaton | Joel | V |
| Oaton | Rochelle | F |
| Patton | Joel | Z |
| Patton | Penelope | E |
| Patton | Samantha | |
| Rosenthal | Chris | A |
| Tonkin | Michelle | Z |
| Tonnibrook | Belinda | T |
+------------+-----------+---------+
18 rows in set (0.00 sec)
sorts in ascending order, or ASC. To sort in
reverse or descending order, DESC can be used.
Consider an example:
SELECT * FROM customer WHERE city='Melbourne' ORDER BY surname DESC;
5.5.3.2 GROUP BY
The GROUP BY clause is
different from ORDER BY because
it doesn't sort the data for output. Instead, it
sorts the data early in the query process, for the purpose of
grouping or
aggregation
.
Grouping data using a sort is the easiest way to discover properties
such as maximums, minimums, averages, and counts of values.Consider an example:
SELECT city, COUNT(*) FROM customer GROUP BY city;This query first sorts the rows in the customer
table by city and groups the rows with matching
values together. The output of the query consists of two columns. The
first is a sorted list of unique cities. The second shows, for each
city, the COUNT of the number of customers who
live in that city. The number of rows that are output is equal to the
number of different city values in the
customer table, and the effect of
COUNT(*) is to count the number of rows per group.Here are the first few lines output by the query:
+--------------+----------+So, for example, there are 14 customers who live in Alexandra, that
| city | COUNT(*) |
+--------------+----------+
| Alexandra | 14 |
| Armidale | 7 |
| Athlone | 9 |
| Bauple | 6 |
| Belmont | 11 |
| Bentley | 10 |
| Berala | 9 |
| Broadmeadows | 11 |
is, 14 rows in the customer table are grouped
together because they have a city value of
Alexandra.The GROUP BY clause can find
different properties of the aggregated rows. Here's
an example:
SELECT city, MIN(birth_date) FROM customer GROUP BY city;This query first groups the rows by city and then shows the oldest
customer in each city. The first few rows of the output are as
follows:
+---------------+-----------------+
| city | MIN(birth_date) |
+---------------+-----------------+
| Alexandra | 1938-04-01 |
| Armidale | 1943-04-04 |
| Athlone | 1943-04-04 |
| Bauple | 1922-11-26 |
|
GROUP BY clause. Five
particularly useful functions are:
AVG( )
Finds the average value of a numeric attribute in a set
MIN( )
Finds a minimum value of a string or numeric attribute in a set
MAX( )
Finds a maximum value of a string or numeric attribute in a set
SUM( )
Finds the sum total of a numeric attribute
COUNT( )
Counts the number of rows in a set
The SQL standard places a constraint on the GROUP
BY clause that MySQL doesn't
enforce. In the standard, all attributes that are selected (those
that are listed immediately after the SELECT
statement) must appear in the GROUP
BY clause. Most examples in this chapter
don't meet this unnecessary constraint.
5.5.3.3 HAVING
The
HAVING clause
permits conditional aggregation of data into groups. For example,
consider the following query:
SELECT city, count(*), min(birth_date) FROM customerThe query groups rows by city, but only for cities
GROUP BY city HAVING count(*) > 10;
that have more than 10 resident customers. For those groups, the
city, count of customers, and earliest birth date of a customer in
that city is output. Cities with less than 10 customers are omitted
from the result set. The first few rows of the output are as follows:
+--------------+----------+-----------------+The HAVING clause must contain an attribute or
| city | count(*) | min(birth_date) |
+--------------+----------+-----------------+
| Alexandra | 14 | 1938-04-01 |
| Belmont | 11 | 1938-04-01 |
| Broadmeadows | 11 | 1955-10-13 |
| Doveton | 13 | 1943-04-04 |
| Eleker | 11 | 1938-04-01 |
| Gray | 12 | 1943-04-04 |
expression (such as a function or an alias) from the
SELECT clause; in this example,
count(*) is listed after the
SELECT and is used in the
HAVING condition.The HAVING clause should be used exclusively with
the GROUP BY clause. It is slow and should never
be used instead of a WHERE clause. For example,
don't do this:
SELECT cust_id, surname FROM customer HAVING surname = "Leramonth";Do this instead:
SELECT cust_id FROM customer WHERE surname = "Leramonth";
5.5.3.4 Combining clauses
You can combine
ORDER BY,
GROUP BY,
HAVING, and WHERE. When all
four are used, they must appear in the order
WHERE, then GROUP
BY, then HAVING, and then
ORDER BY. This is intuitive
because the WHERE clause picks the rows from the
table, then GROUP BY organizes
the rows into sets, then HAVING picks the sets
that match a condition, and then the data is sorted by the
ORDER BY condition just before
it's output.Consider an example. Suppose we want to find the number of customers
with the same name who live in each city in the state of Victoria,
where the same name is defined as the same first name and surname.
For example, this might determine that there are five John Smiths who
live in Inverloch and three Tuong Nguyens in Carlton.
Here's the query:
SELECT city, surname, firstname, count(*) FROM customerThe query first uses the WHERE clause to pick the
WHERE state = 'VIC'
GROUP BY surname, firstname HAVING count(*) >= 2
ORDER BY city;
rows of customers that live in the state of Victoria. The rows are
then grouped together into sets, where the grouping condition is that
the customer surname and firstname are the same. Then, only those
sets that have more than one customer with the same name are kept by
the HAVING clause; this gets rid of unique names.
Last, the ORDER BY clause sorts the customers by
their city, and the city, first name, surname, and count of the
number of customers is output. Here is the output from the winestore
customer table:
+--------------+-----------+-----------+----------+The output shows, for example, that there are two Betty Chesters who
| city | surname | firstname | count(*) |
+--------------+-----------+-----------+----------+
| Broadmeadows | Mellaseca | Anthony | 2 |
| Eleker | Leramonth | Harry | 2 |
| Kalimna | Galti | Nicholas | 2 |
| Lucknow | Mellili | Derryn | 2 |
| McLaren | Chester | Betty | 2 |
+--------------+-----------+-----------+----------+
5 rows in set (0.00 sec)
live in McLaren city in the state of Victoria.The GROUP BY clause sorts
before it groups the rows into sets. Therefore, you
don't need to use ORDER BY if you
want the data to be output in the sort order used by the
GROUP BY. For example, you
don't need to do this:
SELECT * FROM customer GROUP BY surname ORDER BY surname;If you leave out the ORDER BY clause,
you'll get the same output:
SELECT * FROM customer GROUP BY surname;However, in practice, it doesn't really matter: the
MySQL query optimizer will ignore the ORDER
BY clause if it's unnecessary. We
discuss the query optimizer in Chapter 15.
5.5.3.5 DISTINCT
Suppose we want to find out which
different cities our customers live in. The following query shows the
cities for all of the customers:
SELECT city FROM customer;The problem is that a city name appears more than once if more than
one customer lives in that city. What we really want is a list of
unique cities that the customers live in.The DISTINCT clause presents only one example of
each identical row from a query. We can use it to find out the unique
cities the customers live in:
SELECT DISTINCT city FROM customer;This shows one example of each different city in the
customer table.This example has exactly the same result as:
SELECT ciry FROM customer GROUP BY city;The DISTINCT clause is often slow to run, much
like the GROUP BY and
HAVING clauses. We discuss how indexes and query
optimization can speed queries in Chapter 15.
5.5.4 Limiting Output in MySQL
The LIMIT
operator is MySQL-specific and is used to control the size of the
output. For example, the following query returns only the first five
rows from the customer table:
SELECT * FROM customer LIMIT 5;This saves query evaluation time and reduces the size of the result
set that's buffered in memory by MySQL.
It's particularly useful in a web database
application where one page of results is presented from a large
table.You can also specify which row to begin at, and then how many rows
you want:
SELECT * FROM customer LIMIT 100,5;This returns the 100th to
104th rows from the
customer table.Row numbering begins at row zero. For example, if you want the first
five rows of the customer table, use:
SELECT * FROM customer LIMIT 0,5;The following statement produces five rows beginning with row two:
SELECT * FROM customer LIMIT 1,5;Be careful: forgetting to count from zero is a common mistake.If you want all rows after a particular row, the second parameter can
be set to -1:
SELECT * FROM customer LIMIT 600,-1;For the winestore customer table, this returns
50 rows with cust_id values of 601 to 650.The LIMIT operator is included at the end of an
SQL statement, after the optional WHERE,
GROUP BY, HAVING, and
ORDER BY clauses.