SQL Bible [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

SQL Bible [Electronic resources] - نسخه متنی

Alex Kriegel

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید






No Strings Attached

Generally all strings could be divided into
character strings (to store plain text) and binary strings which contain either
machine code (computer programs) or special binary instructions for other
programs.


Character strings


A character string can be defined simply
as a sequence of bytes. The length of the string is the number of bytes in the
sequence. A string of zero length is called an empty
string.
It can be an equivalent to
NULL (special concept introduced at the
end of this chapter) or not, depending on implementation. SQL99 specifically
differentiates between empty strings and nulls.

All strings in SQL can be of fixed length
or varying length. The difference is quite simple, but sometimes not very easy
to understand for people with no technical background, so let us explain it in
some greater detail.

Fixed-length character
strings


If you define
string to be of a fixed length, the system preallocates the desired number of
bytes in memory and/or computer hard disk. It does not matter if the actual
value to be stored in that string is exactly that many bytes, twice smaller, or
just one character long — it is still going to occupy the whole allocated space
(unused bytes will be padded with blank characters), so all strings will have
exactly the same length. For example, imagine you defined a column
DATABASE as having a character field of
length 13, and now we want to store three strings in that column:
ORACLE,
UDB2, and
MS
SQL
SERVER.
Figure
3-1
illustrates the results of that operation. Note that all strings
are exactly 13 bytes long.


Figure 3-1: Fixed-length
character string storage

Character strings of varying
length


If you define
DATABASE column as a varying-length
string with maximum 13 characters to store, the picture will be different. The
actual memory or disk space required to hold our values is allocated
dynamically. As a result, all three strings will be of different length and
will require different numbers of bytes to hold them. String
ORACLE occupies 6 bytes,
DB2 UDB — 4 bytes and
MS
SQL
SERVER takes maximum allowed 13 bytes.
(See
Figure
3-2
.)


Figure 3-2: Varying-length
character string storage

Here is the general platform-independent
recommendation: use a fixed-length data type when your values are expected to
be of the same size, and a varying-length one for values when size is expected
to vary considerably. In the example above it rather looks logical to use the
varying-length strings, but if we need a column to store, say, gender in form
M or
F, a fixed-length string is more
appropriate.

National character strings


Even though English is a very popular
language, it is not the only language on Earth. And in spite of the fact that
practically all major software companies reside in the United States, the
market dictates its own rules. For example, Oracle is a very popular database
vendor around the world with customers in China, India, Korea, Germany, France,
Israel, Russia, Saudi Arabia, and many other countries.





Note

Oracle has customers in 145 countries. But would that be the
case if the English language was the only option customers were able to use?
Most likely, the answer is "no." Customers have to be able to store and use
information in their native language — otherwise they would rather use some
less-efficient and/or more expensive DBMS vendor who provides that
option.


Now we have a little bit of a problem.
So far we've used terms "character" and "byte" as synonyms. Byte is a computer
term for a unit of information storage that consists of 8 bits. Each bit can
either be 1 or 0 and the combination of 8 bits allows us to store 256
(28)
distinct values (or 256 different characters represented by numbers from 0 to
255). That looks like a lot, but ... not actually. We need separate holders for
uppercase and lowercase letters, punctuation marks, digits, math symbols, and
so on. That barely leaves space for distinct characters used in other languages
that employ the Latin alphabet. And what about the ones which don't? There are
about 3,000 different languages in the world, dead and living, in addition to
constructed languages like J.R.R. Tolkien's Quenya, Sindanin, or Entish, and
most of them have their own distinct alphabets!





Note

ASCII (American Standard Code for Information Interchange) was
published in 1968 as a standard of ANSI. It uses the aforementioned 256 holders
to store different characters, and it remains a useful standard to this
day.


The solution seems to be rather
intuitive — use two bytes per character instead of one. That allows 65,535
(216)
distinct combinations, which is enough to store all existing characters from
every major language on Earth.

SQL has two data types to store strings
in national characters — national character string and national character
string of varying length — that behave in exactly same way as previously
described character string and character string of varying length
correspondingly, but use the two-byte Unicode standard. So, if you declared
your DATABASE column as a national character field of size 13, it would still
hold 13 characters, but would reserve 2 bytes for each letter, for a total of
26 bytes. The difference is, now it can hold the names from previous examples
spelled in practically any language, for example, in Russian.
Figure
3-3
illustrates that.


Figure 3-3: Fixed-length Unicode
character storage

Figure 3-4 shows same concept for national
characters of varying length.


Figure 3-4: Varying-length
Unicode character storage








Introducing Unicode

The ultimate successor to ASCII is
Unicode. It is a standard double-byte character set that
assigns a unique number to every single character, so all of them can be
represented in one character set. To speak in database terminology, the Unicode
character number serves as a primary key to index virtually all the world's
characters. (Another good example of database use!) The Unicode standard has
been adopted by such industry leaders as Apple, HP, IBM, Microsoft, Oracle,
SAP, Sun, Sybase, Unisys, and many others. Unicode is required by modern
standards such as XML, Java, JavaScript, CORBA, WML, HTML, and is the official
way to implement ISO standard 10646. It is supported in many operating systems,
all modern browsers, major RDBMS vendors, and many other products. The
emergence of the Unicode standard and the availability of tools supporting it
are among the most significant recent global software technology trends.















Note

In Russian Oracle is spelled with only five characters rather
than six in English, so only five memory (or hard disk) holders are occupied,
but now each holder is two bytes long.


Let's talk about SQL99 standards and
implementation specifics for all types of character strings. These are
summarized in
Table
3-1
.




















Table 3-1: Major Vendor Implementations Character String
Data Types


SQL99


Oracle
9i


DB2 UDB
8.1


MS SQL SERVER
2000


CHAR[ACTER]

[(n)]

CHAR[ACTER]

VARYING(n) OR
VARYING(n)

CLOB


CHAR[ACTER]

[(n)]

CHAR[ACTER]

VARYING(n)

VARCHAR(n)

VARCHAR2(n)

LONG
[VARCHAR]

CLOB


CHAR[ACTGR]

[(n)]

CHAR[ACTER]

VARCHAR(n)

VARCHAR(n)

LONG
VARCHAR


CHAR[ACTER][(n)]

CHAR[ACTER]

VARYING[(n)]

VARCHAR[(n)]

TEXT


NATIONAL
CHAR[ACTER] [(n)]OR NCHAR[(n)] OR
CHARACTER[(n)] CHARACTER SET
<char_set_name>


NATIONAL
CHAR[ACTER] [(n)]

NCHAR[(n)]


GRAPHIC[(n)]


NATIONAL
CHAR[ACTER][(n)] NCHAR[(n)]


NATIONAL
CHAR[ACTER] VARYING(n) OR NCHAR
VARYING(n) OR CHARACTER VARYING(n) CHARACTER SET
<char_set_name>


NATIONAL
CHAR[ACTER] VARYING[(n)] NCHAR VARYING(n) NVARCHAR2(n)
NCLOB


VARGRAPHIC(n) LONG VARGRAPHIC(n)
DBCLOB(n)


NATIONAL
CHAR[ACTER] VARYING(n) NCHAR VARYING[(n)]
NVARCHAR[(n)]


SQL99


SQL99 has two major character sets:
CHARACTER and
CHARACTER
VARYING. In addition, there are also
NATIONAL
CHARACTER and
NATIONAL
CHARACTER
VARYING.

CHARACTER can also be abbreviated with
CHAR. The size can optionally be
specified in the form
CHARACTER(n). For example,
CHARACTER(15) can hold character
strings up to 15 characters long. If size is omitted, the default is 1. An
error occurs if one tries to store a string that is bigger than the size
declared.

CHARACTER
VARYING can be abbreviated with
CHAR
VARYING or
VARCHAR. You have to specify the
maximum size for strings to be stored, for example,
CHARACTER
VARYING(15) holds 15-character
strings, or smaller.

NATIONAL
CHARACTER (NATIONAL
CHAR,
NCHAR,
CHARACTER
CHARACTER
SET
<char_set_name>) specifies the
default data type for the country of implementation. This is a fixed-length
character string data type.

NATIONAL
CHARACTER
VARYING (NATIONAL
CHAR
VARYING,
NCHAR
VARYING,
CHARACTER
VARYING
CHARACTER
SET
< char_set_name>,
CHAR
VARYING
CHARACTER
SET
< char_set_name>) is a
varying-length country-specific character string data type.

CLOB is
a new SQL99 data type to store large nondatabase-structured text objects of
varying size and complexity, such as employees' resumes, collections of papers,
books, and other similar data.

Oracle
9i

Oracle is fully compliant with SQL99
standards for character strings.



CHAR is used for fixed-length
strings. The default length for a
CHAR column is 1 byte with a
maximum of 2,000 bytes.



VARCHAR2 is an Oracle data
type to store varying-length character strings. It does not have the default
length, so you have to specify a value from 1 to 4,000 (maximum number of bytes
for
VARCHAR2).



NCHAR and
NVARCHAR2 are used to store
fixed-length and varying-length national character strings. Beginning with
Oracle9i, they were redefined to be Unicode-only data
types and can hold up to 2,000 and 4,000 characters (not
bytes!) correspondingly. That means if you declare a column to be
CHAR(100) it will allocate 100
bytes per column, but
NCHAR(100) Unicode-based column
requires 200 bytes.





Note

The
VARCHAR data type in Oracle is
currently a synonym to
VARCHAR2. If you declare a
column as
VARCHAR(30), it will be
converted it to
VARCHAR2(30) automatically.
Oracle does not recommend the use of
VARCHAR as a data type, but
rather recommends
VARCHAR2 instead because
keyword
VARCHAR may be later used in
some different way.




CLOB and
NCLOB can store up to four
gigabytes of data in Oracle. Both fixed-length and variable-length character
sets are supported.
CLOB uses the
CHAR database character set, and
NCLOB stores Unicode data using
the national character set.



LONG is an old Oracle data
type to store variable-length character strings containing up to two gigabytes.
It is similar to
VARCHAR2, but has many
limitations. For example, you cannot use
LONG in the
WHERE clause of a
SELECT statement (discussed in
Chapter
8
), a table can't have more than one
LONG column, it can't be indexed,
and so on. Oracle strongly recommends to discontinue the use of the
LONG data type and use
CLOB instead.





Note

Oracle has synonyms for SQL99 compatibility. For example,
you can use
CHARACTER(100) rather than
CHAR(100) or
CHARACTER
VARYING rather than
VARCHAR2 to attain the same
results. See
Table
3-1
for more details.




DB2 UDB 8.1


DB2 has
following character string data types:



CHARACTER is compliant with
SQL99 standards. The maximum length is 254 characters. The default length is
1.



VARCHAR is used for
varying-length strings and has a maximum of 32,672 characters.



LONG
VARCHAR is virtually same as
VARCHAR, but can hold larger
values (up to 32,700) and can't be limited to a certain number of
characters.



CLOB types are SQL99
compliant varying-length strings of up to two gigabytes. An optional maximum
length can be supplied in kilobytes (K|k), megabytes (M|m), or gigabytes (G|g).
For example,
CLOB (10M) would allow maximum of
10,048,576 characters.



GRAPHIC is a rough DB2
equivalent to
NATIONAL
CHARACTER. It is a double-byte
data type, which may range from 1 to 127 characters. If the length
specification is omitted, a length of 1 is assumed.



VARGRAPHIC is a
varying-length double-byte character string data type, comparable to SQL99
NATIONAL
CHARACTER
VARYING. The range is from 1 to
16,336.



LONG
VARGRAPHIC is similar to
VARGRAPHIC with a maximum length
of 16,350. It does not have an optional length limit to be supplied by
user.



DBCLOB is a double-byte
equivalent to
CLOB. Maximum storage is one
gigabyte of character data.
DBCLOB accepts a maximum length
in the same way as
CLOB.





Note

GRAPHIC,
VARGRAPHIC, and
DBCLOB data types are not
supported in the Personal Edition of DB2 supplied with your book.




See
Table
3-1
for more information.

MS SQL Server 2000


The following character string data
types are supported by MS SQL Server:



CHAR and
VARCHAR are used for fixed-length
and variable-length character data correspondingly. The maximum length is 8,000
characters. Unlike Oracle, you don't have to specify length for
VARCHAR — it defaults to 1 like
CHAR.



TEXT is similar to
VARCHAR, but can hold much larger
values. Its maximum length is two gigabytes or 231 – 1
(2,147,483,647) characters.



NCHAR and
NVARCHAR, and
NTEXT are Unicode equivalents to
CHAR,
VARCHAR, and
TEXT.
NCHAR and
NVARCHAR can hold up to 4,000
characters;
NTEXT is much larger — one
gigabyte or 230 – 1
(1,073,741,823) characters.



For SQL99 compatibility synonyms see
Table
3-1
. If one data type has more than one name (or synonym) the most
widely used name is given in italics.

Character string literals


The terms literal
and constant refer to a fixed data value, for
instance

'Frozen Margarita' 'ALEX'
'2003/08/07' '10101101'

are all character literals. Character
literals are enclosed in single quotes. To represent one single quotation mark
within a literal, you can enter two single quotation marks:

'O''Neil'

Character literals are surprisingly
consistent between all our three major vendors, with only slight variations.
For example, MS SQL Server allows double quotes for character literals instead
of single ones if the option
QUOTED_IDENTIFIER is set off for a
connection. To represent a national character set literal, it has to be
preceded by capital letter
N (DB2 understands
G in addition to
N):

N'Jack Smith' N'Boris M.
Trukhnov' N'123 OAK ST.'

Text entered using this notation is
translated into the national character set.


Binary strings


A binary
string is a sequence of bytes in the same way that a character string is, but
unlike character strings that usually contain information in the form of text,
a binary string is used to hold nontraditional data such as images, audio and
video files, program executables, and so on. Binary strings may be used for
purposes similar to those of character strings (e.g., to store documents in MS
Word format), but the two data types are not compatible; the difference being
like text and a photo of the same text. Binary string data types are summarized
in
Table
3-2
.























Table 3-2: Binary String Data Types


SQL99


Oracle 9i


DB2 UDB
8.1


MS SQL SERVER
2000


BIT


BINARY[(n)]


BIT
VARYING


RAW(n)


VARBINARY[(n)]


LONG
RAW


BLOB


BLOB


BLOB(n)


IMAGE









Character vs. Special Files

It might sound a little bit confusing —
why plain text documents can be stored as character strings, and a Word
document has to be treated as a binary string. The thing is, a Word file is a
text document from user's point of view, but from computer storage perspective
it is not. In addition to plain text characters it contains many special signs
and instructions that only MS Word software can interpret. The same is true for
any other special files — bitmaps, spreadsheets, audio and video files, and so
forth. You can think of it in this way: a special file (e.g., of the DOC, XLS,
BMP, or AVI type) is like a tape for VCR, whereas a program (MS Word, Excel,
Paint, QuickTime Player) is like a VCR. You have to have a VCR to play a tape,
and it has to be the right VCR — if you try to play a standard US VHS tape in
NTSC format on a European video recorder (PAL format), it's not going to work.
You might see some blinking on your screen, you will hear some noise, but you
will definitely not be able to watch the movie. Just try to open a Word file
with, say, Notepad and you will see what we are talking about.











SQL99


SQL99 has following data types to store
binary strings:
BIT,
BIT
VARYING, and
BLOB.



BIT is a fixed-length binary
string somewhat similar to
CHAR. If you declare a column to be
BIT(100), 100 bytes will be allocated
in memory/disk, and if the object you store is just 60 bytes, it's still going
to occupy all 100 bytes.



BIT
VARYING is similar to
VARCHAR — even if you specify
BIT
VARYING(100) to be the data type
lasting the previous example, it will only take 60 bytes to store the
object.



BLOB is a binary equivalent to
CLOB.



Oracle
9i


Oracle doesn't have an equivalent to
SQL99 BIT, but has two data types that correspond to
BIT
VARYING —
RAW and
LONG
RAW.
BLOB data type is also supported.



RAW can hold a maximum of 2,000
bytes. The size has to be specified.



LONG
RAW can accumulate up to two
gigabytes of data. This data type is obsolete, and Oracle strongly recommends
converting it to
BLOB.



BLOB can store up to four
gigabytes of binary data in Oracle.



DB2
UDB 8.1


The only data type for binary strings in
DB2 is
BLOB, which can be up to 2 gigabytes
long.

MS SQL Server
2000


MS SQL Server has three different data
types for binary strings:
BINARY,
VARBINARY, and
IMAGE.



BINARY is a fixed-length data
type to store binary data. The size can be specified from 1 to 8,000; the
actual storage volume is size
+ 4 bytes.



VARBINARY can hold
variable-length binary data. The size is from 1 through 8,000. Storage size is
the actual length of the data entered
+ 4 bytes. The data entered can be 0
bytes in length.



IMAGE is a variable-length binary
data type that can hold from 0 through 2,147,483,647 bytes (two gigabytes) of
data.



Binary string literals


MS SQL Server allows literals for
binary string fields (BINARY,
VARBINARY,
IMAGE) either in the form of
hexadecimal numbers prefixed with
0x or as binary strings. The value has
to be unquoted:

0xAE
0101010010100110

MS SQL Server implicitly converts these
literals into appropriate binary format. Oracle and DB2 don't have binary
string literals; the values have to be converted into proper format using
special functions (see
Chapter
10
).





Note

Literals are barely needed for large objects that can store
gigabytes of data. In most cases LOBs are not manipulated by traditional SQL
statements, but rather accessed by special programs and interfaces that know
how to handle such objects without reading them directly into memory.


/ 207