Numbers
There are three kinds of numbers: integers, floats (approximate numbers), and fixed-decimal numbers. (Technically, integers are a type of fixed-decimal number with a scale of zero, but storage methods for integers and decimal numbers differ so we'll talk about them separately.) There is also a number type that auto-incrementswe'll call this a serial. All DBMSs except Oracle use similar numeric formats, so in our discussion of numeric data types, we'll talk about the "normal" DBMSs first and relegate Oracle's treatment of numbers to a sidebar.
Integers
The SQL Standard provides two data types for columns that contain integers: INTEGER (or INT) and SMALLINT. Table 7-5 shows the SQL Standard requirements and the level of support (data type and size in bytes) the Big Eight have for these data types.
INTEGER | SMALLINT | BIGINT/LONGINT | TINYINT/INTEGER1 | |
---|---|---|---|---|
ANSI SQL | Yes | Yes | No | No |
IBM | 4 | 2 | 8 | No |
Informix | 4 | 2 | 8 | No |
Ingres | 4 | 2 | No | 1 |
InterBase | 4 | 2 | No | No |
Microsoft | 4 | 2 | 8 | 1 |
MySQL | 4 | 2 | 8 | 1 |
Oracle | Yes | Yes | No | No |
Sybase | 4 | 2 | No | 1 |
CREATE TABLE Table1 (
column1 SMALLINT)
INSERT INTO Table1 ...
SELECT column1 * 1234
FROM Table1
is slower than:
CREATE TABLE Table1 (
column1 INTEGER)
INSERT INTO Table1 ...
SELECT column1 * 1234
FROM Table1
GAIN: 5/8
although the difference is unnoticeable unless millions of calculations happen.The INTEGER data type is also the default data type of any literal that contains all digits. Consider the expression:
... WHERE column1 = 555
If column1 is a SMALLINT, the DBMS will have to cast its value to INTEGER before it can make the comparison with the integer literal 555.The other integer data types have special advantages though:SMALLINT is the only other SQL Standard data type and is just half the size of INTEGER.TINYINT is the smallest and is still large enough for personal information, but beware: Even seemingly restricted data like "number of wives" or "height in centimeters" could cause overflow. (Solomon had 700 wives, and Goliath was six cubits and a span, or about 290 centimeters tall.)BIGINT is appropriate for numbers larger than two billion, though working with compilers that can't handle such large numbers could be troublesome.Here's another consideration: At the beginning of this chapter, we said it's important to choose a data type that won't allow impossible values for a column. For example, if you have a list of mileages between world cities, using SMALLINT (and perhaps UNSIGNED if the DBMS allows it) gives you a free check constraintit should be impossible to insert the absurd value 35000. We still hold to this advice, with one caveatIt Won't Always Work. That is:"On some systems, for example, the numeric operations for some data types may silently underflow or overflow."PostgreSQL Interactive Documentation"When asked to store a value in a numeric column that is outside the column type's allowable range, MySQL clips the value to the appropriate endpoint of the range and stores the resulting value instead. Conversions that occur due to clipping are reported as 'warnings' for ALTER TABLE, LOAD DATA INFILE, UPDATE, and multi-row INSERT [but not for single-row INSERT]."MySQL Reference Manual (Square bracket comment ours)If input errors are a serious worry, only an explicit CHECK constraint will do the job right.If you decide to define column1 as INTEGER, don't just consider whether the largest possible value is less than or equal to 214783647, but also whether SUM(column1) will be less than or equal to 214783647. A DBMS will normally decide that the sum of an INTEGER column is a BIGINT or a DECIMALbut it could just overflow instead.Why are telephone numbers stored in CHAR(12) columns instead of BIGINT columns? The answer is that, although a CHAR(12) column is longer, there are built-in functions for all the operations that are likely to be performed on telephone numbers: LIKE, SUBSTRING, SIMILAR, and so on. Meanwhile, all the built-in numeric operators (+ / *) are useless for telephone numbers. The general rule in such cases is that a column should have a non-numeric data type if all appropriate operations for it are non-numeric. A similar generality can be stated for the temporal data types.Recommendation: Prefer INTEGER for integers unless maximum valuesincluding those from arithmetic operationsexceed the INTEGER range.
Floats
The SQL Standard provides three data types for columns that contain floats: REAL, FLOAT, and DOUBLE PRECISION (or DOUBLE). Table 7-6 shows the SQL Standard requirements and the level of support (data type and precision) the Big Eight have for these data types.Notes on Table 7-6:The letters IEEE mean "according to the IEEE 754 Standard for Binary Floating-Point Arithmetic." Informix and PostgreSQL use "the native C float," which coincidentally corresponds to IEEE 754 for any common C compiler.IEEE single storage is 32 bits (four bytes). It allows for seven-digit precision, with a usual range from 3.402E+38 to 1.175E37 for negative numbers, zero, and from +1.175E-37 to +3.402E+38 for positive numbers.REAL and FLOAT(n)where n <= 23are usually synonymous and refer to a 32-bit floating-point number, IEEE single precision. InterBase, though, allows you to define a column with a specific precision for FLOAT, for example:
but ignores the precision specified. InterBase FLOAT(n) is always 64-bit IEEE double precision even though FLOAT alone is always 32-bit IEEE single precision.IEEE double storage is 64 bits (eight bytes). It allows for 15-digit precision, with a usual range from 1.798E+308 to 2.225E-307 for negative numbers, zero, and from +2.225E-307 to +1.798E+308 for positive numbers.DOUBLE PRECISION and FLOAT(n)where n BETWEEN 24 AND 53are usually synonymous and refer to a 64-bit floating-point number, IEEE double precision. For MySQL, REAL is synonymous with DOUBLE, rather than with FLOAT.Oracle accepts columns defined as REAL, FLOAT, or DOUBLE PRECISION, but treats all numbers differently from other DBMSs; see the sidebar "Oracle Numbers."
CREATE TABLE Table1 (
column1 FLOAT(20))
REAL | FLOAT | DOUBLE PRECISION | |
---|---|---|---|
ANSI SQL | Yes | Yes | Yes |
IBM | IEEE single | IEEE double | IEEE double |
Informix | IEEE single | IEEE double | IEEE double |
Ingres | IEEE single | IEEE double | IEEE double |
InterBase | IEEE single | IEEE single | IEEE double |
Microsoft | IEEE single | IEEE double | IEEE double |
MySQL | IEEE double | IEEE single | IEEE double |
Oracle | Yes | Yes | Yes |
Sybase | IEEE single | IEEE double | IEEE double |
UPDATE Table1 SET
float_column = 1.25E02
instead of:
UPDATE Table1 SET
float_column = 125
PortabilityMySQL won't accept a float literal unless it has a two-digit exponent. That is, the literal 1.25E02 is acceptable, but 1.25E2 is not. All other DBMSs allow you to drop the leading zero.Floating-point operations are fast if they go through the computer's Floating Point Unit (FPU), but a compiler can make the cautious assumption that no FPU is present. In that case, floating-point operations are slow because they are emulated instead of performed with the FPU. When you install a DBMS, the installer should detect the FPU automatically and bring in the right code, whether FPU dependent or emulated, so make sure you rerun the install program after hardware upgrades or moves.Take another look at Table 7-6 and the precisions shown for the float data types. The range of IEEE single-precision float is from 1.175E-37 to +3.402E+38, to 7 decimal digits precision, although some DBMSs are more cautious in stating the actual range supported. The range of IEEE double-precision float is from 2.225E-307 to +1.798E+308, to 15 decimal digits precision. Again, some DBMSs give a slightly smaller range. These sizes are shown in Table 7-7.Table 7-7 shows that the range of a single-precision float is 1.175E-37 to +3.402E+38. In reality it isn't possible to store all the real numbers in that range in a four-byte space; it isn't even possible to store all the integers in that range in four bytes (the range of a four-byte INTEGER is from 2.14E9 to +2.14E9). So for most numbers in the single-precision range, you'll need to use whatever number is closest that can be represented in a single float. In other words, a floating-point number is exact in bit combinationsthat is, all bit combinations are exactbut it might not be exactly the same as the number that was inserted originally. Hence the name approximate.The question that arises from this isIs it better to use DOUBLE PRECISION for float columns or REAL? (We'll ignore FLOAT entirely because REAL and DOUBLE are just synonyms for predefined sizes of FLOAT.)We checked the Big Eight to see what happens if the same number is stored both ways. First, we created this table:
CREATE TABLE Table1 (
real_column REAL,
double_column DOUBLE PRECISION)
Precision in Bits (effective mantissa size) | Precision in Decimal Digits(inexact) | Min | Max | |
---|---|---|---|---|
Single Precision | <=23 | 7 | 1.175E-37 | +3.402E+38 |
Double Precision | >=24 AND <=53 | 15 | 2.225E-307 | +1.798E+308 |
INSERT INTO Table1 VALUES (0.01, 0.01)
SELECT * FROM Table1
WHERE real_column = 0.01
/* result is zero rows, "incorrect" */
SELECT * FROM Table1
WHERE double_column = 0.01
/* result is one row, "correct" */
Most DBMSs returned the "correct" result for the second SELECT, but were unable to find a row where real_column contained the value 0.01. Due to their greater precision, DOUBLE PRECISION columns return the expected result more frequently than do REAL columns.Recommendation: Prefer DOUBLE PRECISION for floats.
Decimals
The SQL Standard provides two data types for columns that contain fixed-decimal numbers: DECIMAL and NUMERIC. Table 7-8 shows the SQL Standard requirements and the level of support (data type and maximum precision in digits) the Big Eight have for these data types.
DECIMAL | DECIMAL Precision | NUMERIC | NUMERIC Precision | |
---|---|---|---|---|
ANSI SQL | Yes | N/S | Yes | N/S |
IBM | Yes | 31 | Yes | 31 |
Informix | Yes | 32 | Yes | 32 |
Ingres | Yes | 31 | Yes | 31 |
InterBase | Yes | 18 | Yes | 18 |
Microsoft | Yes | 38 | Yes | 38 |
MySQL | Yes | 254 | Yes | 254 |
Oracle | Yes | 38 | Yes | 38 |
Sybase | Yes | 38 | Yes | 38 |
Form | Data type | Example |
---|---|---|
Digits and no decimal point | INTEGER | 537 |
Digits with decimal point | DECIMAL | 17.7 |
Exponential notation | FLOAT | 1.32E15 |
Definition #1:
CREATE TABLE Table1 (
column1 DECIMAL(7,5),
...)
Definition #2:
CREATE TABLE Table1 (
column1_pre_decimal_point INTEGER,
column1_post_decimal_point INTEGER,
...)
To add 7.35 to such a divided number, add 35 to column1_post_decimal_point. If the result is greater than 100, subtract 100 and carry one, then add seven plus the carry to column1_pre_decimal_point. The work around sounds awfulbut it works more quickly than a decimal addition.We still live in an era where the questionShould I use floats for decimals?makes some sense, but the era is fast approaching its end. The fact is that most sensible bean-count values (i.e., money) can be stored and manipulated as integers now, and with 64-bit processors the values are trillions. The only reason to use floats for dollars is that some host languages still have no equivalent for DECIMAL. It is notable that all the DBMSs that support a MONEY data type store MONEY internally as a DECIMAL with a predefined scale.Recommendation: Prefer DECIMAL for fixed-decimal numbers and for most floats.
Oracle NumbersTaking a different path from the other DBMSs, Oracle uses a single storage methodpacked decimalfor all kinds of numbers. The format consists of two parts:An exponent: size1 byte.A mantissa: sizeup to 20 bytes, packed decimal, possibly includes a 1-byte sign.Leading and trailing zeros are not stored.Packed decimal is a number representation where each number is expressed as a sequence of decimal digits, with each decimal digit encoded as a 4-bit binary number (or nibble). In some cases, the right-most nibble contains the sign (positive or negative).Oracle's system is good for up to 38 digits of decimal precision. The exponent represents the number of digits before the decimal point. Technically, the decimal point "floats," but all integers on the number line between 1.0E38 and +1.0E38 can be represented. Here's how Oracle stores the decimal number 523:
That is, 523 is stored in a format similar to 5.23 x 102. One byte is used for the exponent (2), and two bytes are used for the three significant digits of the mantissa (5, 2, 3). Blank filler is added to the second byte to fill it completely.The system's range is as big as the ranges for SMALLINT, BIGINT, and REAL, so Oracle can accept column definitions for any of those data types. But the storage is always the same. Because Oracle is only simulating a float by using a wide integer, though, details that are specific to the IEEE 754 specificationfor example, Not a Number (NaN)can't be represented.The differences between Oracle and other DBMSs when it comes to numbers can be summarized as follows:Number sizeOracle numbers are variable-length. Other DBMSs' numbers are fixed-length.Storage formatOracle uses one storage format for numbers. Other DBMSs use at least three storage formats for numbers.Biggest integer supportedOracle's biggest integer is 1.0E38. Other DBMSs' biggest integer is 9.2E18 (BIGINT).Biggest float supportedOracle's biggest "float" is 1.0E38. Other DBMSs' biggest float is 1.7E308 (DOUBLE).The differences are so large, many of the recommendations that apply for other DBMSs do not apply to Oracle. Here are some Oracle-specific recommendations instead:Avoid multiply and divide because the packed decimal numbers will have to be converted before they can be operated on.Allow free space on each page because rows are all variable-length.Don't worry about floating-point storage because it's not there.It is easier to convert decimal numbers to and from packed decimal representation than binary representation, but packed decimal is often converted to binary for arithmetic processing. |
Serials
The SQL:1999 Standard does not provide a data type for columns that contain "serial" numbers but most DBMSs do provide support for auto-incremented, or monotonic, values. Table 7-10 shows the level of support the Big Eight have for serial data types.
Data Type | Sequence Generator | |
---|---|---|
ANSI SQL | N/A | N/A |
IBM | INTEGER AS IDENTITY | N/A |
Informix | SERIAL, SERIAL8 | N/A |
Ingres | TABLE_KEY | N/A |
InterBase | N/A | GEN_ID function |
Microsoft | INTEGER IDENTITY | N/A |
MySQL | INTEGER AUTO_INCREMENT | N/A |
Oracle | N/A | CREATE SEQUENCE |
Sybase | NUMERIC(10,0) IDENTITY | N/A |
CREATE TABLE Table1 (
column1 SERIAL PRIMARY KEY,
column2 INTEGER)
Because the value of column1 increases each time you INSERT a value into column2, the uniqueness of each row is guaranteed.The problem with serials is that they can cause trouble with concurrency control; see Chapter 15, "Locks."Recommendation: Prefer INTEGER for serials and do your own value assignments.
The Bottom Line: Numbers
All DBMSs except Oracle use similar numeric formats, so look for our Oracle-specific recommendations only if you use Oracle.Any number with a scale of zero (no digits after the decimal point) should be in an INTEGER column. INTEGER is the default data type of any literal that contains all digits. SMALLINT is the only other SQL Standard data type and is usually just half the size of INTEGER. TINYINT is the smallest integer data type and is still large enough for most personal information. BIGINT is appropriate for numbers larger than two billion though working with compilers that can't handle such large numbers could be problematic.Floating-point operations are fast if they go through the computer's FPU. When you install a DBMS, the installer should detect the FPU automatically and bring in the right code, whether FPU dependent or emulated, so make sure you rerun the install program after hardware upgrades.The primary advantage of DECIMAL is that it is easy to cast to CHAR, because the number isn't stored in binary form. The primary disadvantage of DECIMAL is that it must be converted to binary form before some arithmetic operations can be performed on it.Serial data types are useful for ensuring each row of a table has a unique identifier. The problem with serials is that they can cause trouble with concurrency control.Recommendation if you don't use Oracle: Prefer INTEGER for integers unless maximum valuesincluding those from arithmetic operationsexceed the INTEGER range. Prefer DECIMAL for fixed-decimal numbers and for most floats. Prefer DOUBLE PRECISION for floats where DECIMAL isn't appropriate. Prefer INTEGER for serials and do your own value assignments.Recommendation if you use Oracle: Data type doesn't really matter because all numbers are stored the same. Avoid multiply and divide because the packed decimal numbers will have to be converted before they can be operated on. Allow free space on each page because rows are all variable-length. Don't worry about floating-point storage because it's not there.