7.7 The Future of Replication
To solve some of
MySQL's current shortcomings and to provide the
infrastructure for handling problems that MySQL
hasn't yet seen, a number of future enhancements
have been proposed for MySQL. Let's look at them
briefly.
7.7.1 Eliminating the Snapshot
With MySQL's current
implementation, it's difficult to add a slave to a
master after the master has been running for a long period of time.
Many of the original binary logs have probably been removed to save
space. Without all the logs, you can't simply
configure the slave and point it at the master.Even if you have all the binary logs on the master, it may take days,
weeks, or even months for a slave to execute all the queries and
finally catch up to the master. If you're looking to
add slaves in a hurry, this clearly isn't the way to
do it.In either case, the ideal solution is simply to configure the new
slave and tell it to begin replicating. Behind the scenes, the slave
contacts the master and requests copies of the all the tables it
needs, probably using a mechanism similar to LOAD
TABLE FROM MASTER. The master
will need a way to track all changes to tables between the time that
the slave begins and finishes copying the tables. Upon completion of
the copy, the slave receives all the necessary changes and begins
replicating from the binary log.An alternative is for all of MySQL's storage engines
to implement a versioning scheme similar to
InnoDB's. When a new slave connects and begins to
copy the tables, it can get a snapshot from that moment in time. When
the copy is complete, the slave can begin replicating from the binary
log position corresponding to the moment when the snapshot was
marked.
7.7.2 Fail-Safe Replication
When a master fails, you must
select a new master and instruct all the slaves to connect to the new
master and begin replicating. Not only is that process prone to
errors, it can be time-consuming too. Ideally, MySQL should handle
failover automatically.The proposed solution involves each slave registering itself with the
master so that the master can keep track of it. Not only will the
master know which servers are slaves, it can also keep track of how
up to date each slave is. The slaves, in turn, will also keep track
of who all the other slaves are.In the event that the master fails, the slaves can elect a master
based on the available information. Ideally, they will find the slave
that was the most up to date when the master went down.
7.7.3 Safe Multi-Master Replication
Today it's
possible to use replication in a multi-master architecture, as
depicted earlier (see Figure 7-3). The major
drawback to doing so, however, is that you can't
rely on AUTO_INCREMENT columns to function
properly.Each MyISAM table has a single counter that controls the next
AUTO_INCREMENT value. Once that value has
increased, it can't easily be decreased. If inserts
are timed properly, they cause data to become inconsistent between
the two masters.Imagine the following events occurring on two servers,
master1 and master2:Both servers start with an empty orders table.master1 inserts a record for customer 58, which
is assigned ID 1.master2 inserts a record for customer 1232,
which is assigned ID 1.master2 replicates
master1's insert, adding the
record for customer 58 and trying to assign it an ID of 1. That fails
and results in a duplicate key error.master1 replicates
master2's insert, adding the
record for customer 1232 and trying to assign it an ID of 1. That
fails and results in a duplicate key error.
Each master was given an insert by some client before it had
replicated the other master's insert. The result is
that both masters are out of sync.The current solution is to avoid using
AUTO_INCREMENT fields completely and assign
primary keys through some other means. You might use an MD5 hash of
some values in the record, or perhaps use another library to generate
a globally unique identifier (GUID).Let's look at the two proposed solutions for the
future.
7.7.3.1 Multipart auto-increment unique keys
The first is to use MyISAM's multipart
auto-increment unique keys. Rather than using a single column as a
primary key, you'd set up a table like this:
CREATE TABLE orders (
server_id INTEGER UNSIGNED NOT NULL,
record_id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
stuff VARCHAR(255) NOT NULL,
UNIQUE mykey (server_id, record_id)
);
Notice that the record_id is an
AUTO_INCREMENT field and is the second part of a
two-part unique key. When you insert NULL into the
record_id column, MySQL will consider the value of
server_id when automatically generating a value.To illustrate this, notice the following:
mysql> insert into orders values (1, NULL, 'testing');
Query OK, 1 row affected (0.01 sec)
mysql> insert into orders values (1, NULL, 'testing');
Query OK, 1 row affected (0.00 sec)
mysql> insert into orders values (2, NULL, 'testing');
Query OK, 1 row affected (0.00 sec)
mysql> select * from orders;
+-----------+-----------+---------+
| server_id | record_id | stuff |
+-----------+-----------+---------+
| 1 | 1 | testing |
| 1 | 2 | testing |
| 2 | 1 | testing |
+-----------+-----------+---------+
3 rows in set (0.03 sec)
MySQL, in effect, allows you to select from multiple
AUTO_INCREMENT sequences based on the prefix you
use. By adding a function such as SERVER_ID( ) to
MySQL and rewriting the previous queries, you can use
AUTO_INCREMENT with multi-master replication
safely.
mysql> insert into orders values (SERVER_ID( ), NULL, 'testing');
Query OK, 1 row affected (0.01 sec)
mysql> insert into orders values (SERVER_ID( ), NULL, 'testing');
Query OK, 1 row affected (0.00 sec)
mysql> insert into orders values (SERVER_ID( ), NULL, 'testing');
Query OK, 1 row affected (0.00 sec)
There are three problems with this approach. First, it works only for
MyISAM tables. An ideal solution works across all table types.
Another issue is that all slaves require some special logic. Today,
when a slave reads the binary log of a master, it knows the
master's server ID as well as its own, but it
doesn't really do anything with the
master's server ID. In this solution, the slave has
to actually use the master's server ID any time that
it replicated a query that involved the mythical SERVER_ID(
) function. That makes the replication logic a bit trickier
on the slaves.You could work around the lack of a SERVER_ID( )
function by simply using the actual server ID in your SQL statements.
If you know you're talking to server 12, write the
query accordingly:
mysql> insert into orders values (12, NULL, 'testing');
Query OK, 1 row affected (0.01 sec)
But there's the rub. You need to know, in advance of
each query, what the server's ID is. Granted, the
server's ID doesn't change, but if
you're accessing one of many servers via a load
balancer or don't have a persistent connection, the
server you're talking to may change often. So
you'd have to deal with the overhead of obtaining
the server's ID whenever you need it.
mysql> show variables like 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 102 |
+---------------+-------+
1 row in set (0.00 sec)
Finally, and most importantly, using two columns as the primary key
just doesn't feel natural. It feels like a hack or a
workaround. If this solution became widespread, others problems might
arise. For example, setting up foreign-key relationships would be
troublesome. Putting aside the fact that InnoDB
doesn't even support multipart auto-increment unique
keys, how would you define a foreign-key relationship with multipart
keys?
7.7.3.2 Partitioned auto-increment fields
The second solution is to make auto-increment fields a bit more
complex. Rather than simply using a 32-bit integer that starts at 1
and keeps counting, it might make sense to use more bits and
partition the key-space based on the server ID. Currently, server IDs
are 32-bit values, so by using a 64-bit auto-increment value, the two
can be combined. The high 32 bits of the value would be the server ID
of the server that originally generated the record, and the low 32
bits would be the real auto-increment value.Internally, MySQL needs to treat the 64-bit auto-increment value a
lot like the multipart auto-increment unique keys previously
discussed. The value generated for the low 32 bits is dependent on
the value of the high 32 bits (the server ID). The benefit is that
from the user's point of view, it's
a single column and can be used just like any other column. Insert
statements are no more complex; all the magic is conveniently under
the hood, where it belongs.There are some downsides to this approach, however. The most apparent
issue is that there would be large gaps in the values. For the sake
of simplicity, MySQL can always subtract 1 from the server ID when
generating the high bits of the auto-increment value. This allows
values to continue starting at 1 when the server ID is 1. However, as
soon as a second server is introduced, with server ID 2, it inserts
values starting from 4,294,967,297 (232 +
1) and counting up from there.Another problem is that columns will require more space on disk (both
in the data and index files). BIGINT columns are
already 8 bytes (64 bits) wide. Adding another 4 bytes (32 bits) for
the server ID portion of the auto-increment value means a 50%
increase in the space required. That may not sound like a lot, but an
application that requires 64-bit values in the first place is likely
to be storing billions of rows. Adding an additional 4 bytes to a
table containing 10 billion rows means storing an additional 40 GB of
data!It makes sense to break compatibility with existing MySQL versions
(which use 32-bit server IDs) and reduce the size of the server ID to
8 or 16 bits. After all, with even 8 bits available, you can have up
to 255 unique servers in a single replication setup; with 16 bits,
that jumps to 65,535. It's unlikely anyone will have
that many servers in a single replication setup.[9][9] Perhaps Google will decide to run MySQL on their growing farm
of 100,000+ Linux servers. They'd need more than 8
bits.