SQL Statement Processing Steps
The internal mechanisms of SQL statementprocessing are different for each RDBMS, but generally, there are always these
five steps involved to process an SQL statement.
The RDBMS parses the SQL statement by
breaking it up into individual words (tokens) and validating the statement
syntax. The purpose of this step is to make sure the statement does not have
typos and logical errors in it (if it does, the processing stops right there,
saving time and system resources). For example, the following statements would
be rejected at the first step because the first one has keyword
SELECT misspelled, and in the second
one the order of the
FROM and
WHERE clauses is incorrect:
SELCT * FROM
CUSTOMER SELECT * WHERE CUST_ID_N = 65 FROM CUSTOMER
Note | Generally, any SQL statement consists of three types of tokens: keywords like SELECT, UPDATE, WHERE, etc.; table, view, constraint, or other database object names and column names; and values assigned to the columns or used in WHERE and HAVING clause predicates (or special variables for these values that could be substituted with real values in the runtime). In Step 1 of the process described here, only the first group of tokens is validated. |
Parsing is usually the quickest step because it does not require
actual database access.
The statement is then validated by the RDBMS
by checking the statement against the information schema. In fact, this step
validates the second group of tokens by making sure all the table and column
names exist in the database. In addition, it checks the user privileges to
execute the statement, ensures the column names are not ambiguous, and so on.
For example, the following statement would fail at Step 2 (if you try to
execute it against ACME sample database) because column
CUST_ID does not exist in the
CUSTOMER table:
UPDATE CUSTOMER SET
CUST_NAME_S = 'THE BEST CUSTOMER' WHERE CUST_ID = 65
The next step is to optimize the statement.
The query optimization process differs significantly between different RDBMS
vendors, but the idea is the same: to find the most efficient way to execute
the SQL statement. For example, it determines in which order the tables should
be joined, when search conditions are to be applied, whether use the existing
indexes on certain columns, and so on. Optimization is usually very
CPU-intensive; the RDBMS may need to choose from dozens of different ways to
process the same query, especially if it is complex and involves multiple table
joins and conditions; however, when done in the right way, optimization could
improve the query performance dramatically.
The next step is to generate an execution
plan for the statement based on the optimization process performed
during Step 3. The execution plan is simply a binary representation of the
optimized sequence of actions.
Finally, the set of binary instructions created in Step 4 is
executed by the RDBMS. Most database vendors provide mechanisms that allow you
to substitute the third group of token variables with the actual values at this
point.
Note | In addition to the standard five steps to process any SQL statement, an additional step is necessary when processing a SELECT statement, namely returning the query results in some organized form. |
We will be talking about the five steps of
SQL statement processing again in this chapter when discussing dynamic
SQL.