10.6 Data Encryption
In applications that store
sensitive data, such as banking records, you may want the data to be
stored in an encrypted format. Doing so makes it very difficult for
someone to use the data even if they walk up to your server and take
it home. A full discussion of the relative merits of encryption
algorithms and techniques is beyond the scope of this book.
10.6.1 Hashing Passwords
In less sensitive
applications, you may need to protect just a few pieces of
information, such as a password database for another application.
Passwords really shouldn't be stored in the clear,
so they are commonly encrypted in applications. But rather than use
encryption, it may be wise to follow the lead of most Unix systems
and even MySQL itself: use a hashing algorithm on the password and
store the result in your table.Unlike traditional encryption, which can be reversed, hashing is a
one-way process that can't be reversed. The only way
to determine the password that generated a particular hash value is
to use a very computationally expensive brute-force attack (trying
all possible combinations of input).MySQL provides four functions for hashing passwords:
PASSWORD( ), ENCRYPT(
),
SHA1( ), and MD5(
).[12] The best way to see the results of each
function is to try each one on the same source text.
Let's see how the string pa55word
hashes in each:[12] MySQL's
ENCRYPT( ) simply calls the C
library's crypt( ) function. On
some Unix variants, crypt( ) is an MD5
implementation, making it no different from using MD5(
). On others, it is the traditional DES encryption
algorithm.
mysql> SELECT MD5('pa55word');
+----------------------------------+
| MD5('pa55word') |
+----------------------------------+
| a17a41337551d6542fd005e18b43afd4 |
+----------------------------------+
1 row in set (0.13 sec)
mysql> SELECT PASSWORD('pa55word');
+----------------------+
| PASSWORD('pa55word') |
+----------------------+
| 1d35c6556b8cab45 |
+----------------------+
1 row in set (0.00 sec)
mysql> SELECT ENCRYPT('pa55word');
+---------------------+
| ENCRYPT('pa55word') |
+---------------------+
| up2Ecb0Hdj25A |
+---------------------+
1 row in set (0.17 sec)
Each function returns a fixed-length alphanumeric string that can be
stored in a CHAR column. To cope with the
possibility of mixed-case characters in the result of
ENCRYPT( ), it's best to declare
the column CHAR BINARY.Storing hashed data is as easy as:[13][13] While you can do
it the way we describe here, there are a number of reasons why it is
much better to do the MD5 calculations on the client machine if
possible, because the clear-text password might appear in the process
list or in a query log.
INSERT INTO user_table (user, pass) VALUE ('jzawodn', MD5('pa55word') )
To verify user's password, take
the username and password supplied and run a
SELECT query to see if they match. Using a
language such as Perl or PHP, the query might look like this:
SELECT *
FROM user_table
WHERE user = '$username'
AND pass = MD5('$password')
Password hashing is an easy-to-use and relatively secure way to store
passwords in a database without them being easily recoverable.
10.6.2 Encrypted Filesystems
Because MySQL's
various table handlers all store their data as regular files on
whatever filesystem you may be using, it's possible
to use an encrypted filesystem. Most popular operating systems have
at least one encrypted filesystem available, either free or
commercial.The main advantage of this approach is that you
don't have to do anything special for MySQL to take
advantage of it. Because all the encryption and decryption takes
place outside MySQL, it just performs reads and writes without any
knowledge of what's happening under the hood. All
you need to do is make sure your data and logs are stored on the
proper filesystem. From your application's point of
view, there's nothing special about this arrangement
either.There are a few downsides to using an encrypted filesystem with
MySQL. First of all, because all the data, indexes, and logs are
being encrypted, there will be a fair amount of CPU overhead involved
in encrypting and decrypting the data. If you're
thinking about using an encrypted filesystem, be sure to perform good
benchmarks so that you understand how it behaves under heavy load.A more subtle problem with this setup occurs when you consider making
backups of your data. To copy the data to another location (disk,
tape, CD-ROM, server, etc.), the data must be decrypted. To keep the
data safe, you need to find backup software that can encrypt your
backups. The only real workaround is to take a complete dump of the
disk partition. You can safely store a copy elsewhere because the
data remains encrypted. However, there's no way to
selectively restore pieces of the data; you'd need
to restore the entire partition.
10.6.3 Application-Level Encryption
A more common approach to encryption
is to build it into the application (or middleware). When the
application needs to store sensitive data, it first encrypts the data
and stores the result in MySQL. Similarly, when the application
retrieves information from MySQL, it must decrypt it.This approach provides a lot of flexibility. It
doesn't tie you to a particular filesystem,
operating system, or even database (if your code is written in a
generic fashion). It gives the application designer the freedom to
choose an encryption algorithm that's most
appropriate (balancing speed and strength) for the data being stored.Because the data is stored encrypted, backups are very easy. No
matter where you copy the data, it is encrypted. However, it also
means that access to the data must go through software that
understands how to decrypt it. You can't just fire
up the mysql command-line tool and begin issuing
queries.Application-level encryption does have some drawbacks, though. It is
a lot harder for MySQL to effectively index the data, for example.
You may find yourself suffering from significant performance issues.
10.6.3.1 Design issues
This freedom and flexibility have interesting implications for
database design. You need to ensure that the field types you are
using are appropriate for the type of encryption
you're using. Some algorithms produce blocks of data
with fixed minimum sizes. That means you may need a column that can
hold 256 bytes just to hold a piece of data that is significantly
smaller before encryption. Many popular encryption libraries produce
binary data, so you'll need to create columns that
can store binary data. As an alternative, you can convert the binary
data to a hex or base-64 representation, but that would require more
space and time.Deciding exactly what data should and shouldn't be
encrypted isn't easy either. You need to balance
security against making the information in your tables difficult to
query. For example, you might have an account
table that represents bank accounts and contains the following
fields:idtypestatusbalanceoverdraft_protectiondate_established
Which fields make sense to encrypt? If you encrypt the balance, which
seems reasonable, it would be difficult to answer common reporting
questions. For example, you might try to write the following query to
find the minimum, maximum, and average balance of accounts of each
account type:
SELECT MIN(balance), MAX(balance), AVG(balance)
FROM account
GROUP BY type
But the results would be meaningless. MySQL has no clue what the
balance field means, so it would just try to perform those functions
on the encrypted data in the balance field.The obvious but painful solution is for your application to read all
the records from the account table and do the math
for the report you need. That may not be terribly difficult, but
it's annoying. Not only are you reimplementing
functionality MySQL already provides, you're also
slowing down the process considerably.What all this boils down to is a tradeoff between security and the
advantages of using a relational database in the first place. Any
field that contains encrypted data is basically useless to
MySQL's built-in functions because they need to
operate on the unencrypted data. Similar problems arise in query
optimization. In an unencrypted setup, you can easily find all the
accounts with a balance greater than $100,000 by doing this:
SELECT *
FROM account
WHERE balance > 100000
If there is an index on the balance field, MySQL
will probably locate the records in a split second. But if the data
is encrypted, you have to get all the records in your application and
filter them after they're decrypted.
There's just no way for MySQL to help you
out.
10.6.4 Source Code Modification
If you're
looking for a more flexible approach than either encrypted
filesystems or application-based encryption, you can always build a
custom solution. The source code for MySQL is freely available under
the GNU General Public License.This sort of work requires that you either know C++ or hire someone
who does. Beyond that, you'll be looking to create
your own table handler with native encryption support, or you might
find it easier to extend an existing table handler (the MyISAM and
BDB handlers are easiest to understand) with encryption.You'll find the relevant files in the
sql directory of the MySQL source code. Each
table handler is composed of at least two C++ files. The MyISAM
handler code, for example, is in ha_myisam.h and
ha_myisam.cc.