Chapter 1: SQL and Relational Database Management Systems (RDBMS)
Overview
Information
may be the most valuable commodity in the modern world. It can take many
different forms — accounting and payroll information, information about
customers and orders, scientific and statistical data, graphics–to mention just
a few. We are virtually swamped with data. And we cannot — or at least we'd
like to think about it this way — afford to lose it, but these days we simply
have too much data to keep storing it in file cabinets or cardboard boxes. The
need to safely store large collections of persistent data, efficiently "slice
and dice" it from different angles by multiple users and update it easily when
necessary is critical for every enterprise. That need mandates the existence of
databases, which accomplish all the tasks listed above, and then some. To put
it simply, a database is just an organized collection of
information — with emphasis on organized. A more specific definition often used as a
synonym for "database" is database management system
(DBMS). That term is wider and, in addition to the stored information, includes
some methods to work with data and tools to maintain it.
Note | DBMS can be defined as a collection of interrelated data plus a set of programs to access, modify, and maintain the data. More about DBMS later in this chapter. |
Desirable database
characteristics
There are many ideas about what a database
is and what it should do. However, all modern databases should have at least
the following characteristics.
Sufficient capacity
A database's primary function is to store
large amounts of information. For example, an order management system for a
medium-sized company can easily grow into gigabytes of data; the bigger the
company, the more data it needs to store and rely upon. A company that wants to
keep historical (archival) data will require even more storage space. The need
for storage capacity is growing rapidly, and databases provide for
structured storage.
Adequate security
As was noted previously, enterprise data
is valuable and must be stored safely. That means protection of the stored data
not only from malicious or careless human activities, such as unauthorized
logins, accidental information deletions/modifications, and so on, but also
from hardware failures and natural disasters.
Multiuser environment
It's also important to note that in
order to be useful, the information stored in a database must be accessible to
many users simultaneously at different levels of security, and, no matter what,
the data must stay consistent. For example, if two users try to change the same
piece of information at the same time, the result can be unpredictable (e.g.,
data corruption), so situations like that have to be handled appropriately by
internal database mechanisms. Also, certain groups of users may be allowed to
modify several pieces of information, browse other parts of it, and be
prevented from even viewing yet another part. (Some company data can be
strictly confidential with a very restricted access.)
Effectiveness
Users need quick
access to the data they want. It is very important not only to be able to store
data, but also to have efficient algorithms to work with it. For example, it
would be unacceptable for users to have to scroll through each and every record
to find just one order among millions stored in the database; the response to
someone's querying the database must be fast, preferably instantaneous.
Note | As an analogy, suppose you wanted to find all the occurrences of the word "object" in a book. You could physically browse through the entire book page by page until you reach the end. Or you could use the index and determine that the word is used on pages 245, 246, and 348. This situation is comparable to using bad or good programming algorithms. |
Scalability
Databases must be flexible and easily
adaptable to changing business needs. That primarily means that the internal
structure of database objects should be easily modified with minimum impact on
other objects and processes; for example, to add a field in a legacy database
you would have to bring the whole dataset offline, that is, make it
inaccessible to users, modify it, change and recompile related programs, and so
on. We'll talk more about that in the "Database Legacy" section of this
chapter.Another scalability aspect is that data
typically lives longer than the hardware and software used to access and
manipulate it, so it would not be very convenient to have to redesign the
entire database to accommodate the current "flavor-of-the-month" development
environment; for example, in case of a takeover or when company management
suddenly decides to switch production environment from Java to C#.
User-friendliness
Databases are not just for programmers
and technical personnel (some would say not for programmers — period).
Nontechnical users constitute the majority of all database users nowadays.
Accountants, managers, salespeople, doctors and nurses, librarians, scientists,
technicians, customer service representatives — for all these and many more
people, interaction with databases is an integral part of their work. That
means data must be easy to manipulate. Of course, most users will access it
through a graphical user interface with a predefined set of screens and limited
functionality, but ad-hoc database queries and reports become more and more
popular, especially among sophisticated, computer-literate users.
Note | Consider this. An order management application has a screen to view all orders and another window to browse customers. It can also generate a number of reports, including one to analyze orders grouped by customer. But accountant Jerry is working on a report for his boss and needs to find the ten customers with the highest debt. He can request a new report from the IT department, but it will take days (or even weeks) because of bureaucratic routine, programmers' busyness, or something else. The knowledge of SQL can help Jerry to create his own ad-hoc query, get the data, and finish his report. |