High Performance MySQL [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

High Performance MySQL [Electronic resources] - نسخه متنی

Derek J. Balling

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








7.3 Under the Hood


What really happens during
replication? What does the binary log contain?
What's different in Version 4.0? To help answer
those questions, let's get deeper into the details
and then walk through the steps that MySQL performs during
replication. We'll start with an insert on the
master and follow it to completion on the slave.
We'll also look at how MySQL 3.23 and 4.x differ.


7.3.1 Replication in 3.23


MySQL's original
replication code provides basic replication services. The master logs
all write queries to the binary log. The slave reads and executes the
queries from the master's binary log. If the two are
ever disconnected, the slave attempts to reconnect to the master.

If you follow a query from start to finish, here's
what's happening behind the scenes:

The client issues a query on the master.

The master parses and executes the query.

The master records the query in the binary log.

The slave reads the query from the master.

The slave parses and executes the query.

The slave performs a sanity check, comparing its result with the
master's. If the query failed on the slave but
succeeded on the master, replication stops. The reverse is also true.
If the query partially completed on the master but succeeds on the
slave, the slave stops and complains.

The slave updates the master.info file to
reflect the new offset at which it is reading the
master's binary log.

The slave waits for the next query to appear in the
master's binary log. When one appears, it starts
over at Step 4.


That's a relatively simple arrangement. The master
simply logs any queries that change data. The slave reads those
queries from the master, one by one, and executes each of them. If
there are any discrepancies between the results on the master and the
slave, the slave stops replicating, logs an error, and waits for
human intervention.

The simplicity of this system has problems, however. If the master
and slave are separated by a slow network, the speed at which
replication can occur becomes bounded by the network latency. Why?
Because the process is highly serialized. The slave runs in a simple
"fetch query, execute query, fetch query,
..." loop. If the "fetch
query" half of the loop takes more than a trivial
amount of time, the slave may not be able to keep up with the master
during very heavy workloads. The master may be able to execute and
log 800 queries per second, but if the slave requires 25 msec to
fetch each query over the network, it can replicate no more than 40
queries per second.

This can be problematic even with a fast network connection. Suppose
the master executes a query that takes five minutes to complete.
Maybe it's an UPDATE that affects
50 million records. During the five minutes the slave spends running
the same query, it isn't pulling new queries from
the master. By the time it completes the query, it's
effectively five minutes behind the master, in terms of replication.
It has a fair bit of catching up to do. If the master fails during
that five-minute window, there's simply no way for
the slave to catch up until the master reappears. Some of these
problems are solved in 4.0.


7.3.2 Replication in 4.0


To
solve the problem of slaves falling behind because of slow queries or
slow networks, the replication code was reworked for Version 4.0.
Instead of a single thread on the slave that runs in a
"fetch, execute, fetch, ..." loop,
there are two replication threads: the IO
thread
and the SQL thread.

These two threads divide the work in an effort to make sure the slave
can always be as up to date as possible. The IO thread is concerned
only with replicating queries from the master's
binary log. Rather than execute them, it records them into the
slave's relay log.[7] The SQL thread reads queries from the local relay log and
executes them.

[7] To keep things simple, the relay log file uses the same storage
format as the master's binary log.


To put this in context, let's look at the
step-by-step breakdown for replication in MySQL 4.0:

The client issues a query on the master.

The master parses and executes the query.

The master records the query in the binary log.

The slave's IO thread reads the query from the
master and appends it to the relay log.

The slave's IO thread updates the
master.info file to reflect the new offset at
which it is reading the master's binary log. It then
returns to Step 4, waiting for the next query.

The slave's SQL thread reads the query from its
relay log, parses it, and then executes it.

The slave's SQL thread performs a sanity check,
comparing its result with the master's. If the query
failed on the slave but succeeded on the master, replication stops.

The slave's SQL thread updates the
relay-log.info file to reflect the new offset at
which it is reading the local relay log.

The slave's SQL thread waits for the next query to
appear in the relay log. When one appears, it starts over at Step 6.


While the steps are presented as a serial list, it's
important to realize that Steps 4-5 and 6-9 are running as separate
threads and are mostly independent of each other. The IO thread never
waits for the SQL thread; it copies queries from the
master's binary log as fast as possible, which helps
ensure that the slave can bring itself up to date even if the master
goes down. The SQL thread waits for the IO thread only after it has
reached the end of the relay log. Otherwise it is working as fast as
it can to execute the queries waiting for it.

This solution isn't foolproof. It's
possible for the IO thread to miss one or more queries if the master
crashes before the thread has had a chance to read them. The amount
of data that could be missed is greatly reduced compared to the 3.23
implementation, however.


7.3.3 Files and Settings Related to Replication


There are several files and
configuration options related to replication in this chapter. Without
going into a lot of detail on any one of them
(that's done elsewhere), the files fall into three
categories: log files, log index files, and status files.


7.3.3.1 Log files


The
log files are the binary log and the relay log. The binary log
contains all write queries that are written when the log is enabled.
The log-bin option in my.cnf
enables the binary log. Binary log files must be removed when
they're no longer needed because MySQL
doesn't do so automatically.

The relay log stores replicated queries from a MySQL 4.0 slave (from
the master's binary log) before it executes them.
It's best thought of as a spool for queries. The
relay log is enabled automatically in 4.0 slaves. The
relay-log option in my.cnf
can customize the name and location of the relay
log's base filename:

relay-log = /home/mysql/relay.log

Like the binary log, MySQL always appends a sequence number to the
base name, starting with 001. Unlike the binary log, MySQL takes care
of removing old relay logs when they are no longer needed. MySQL 3.23
servers don't use relay logs.


7.3.3.2 Log index files


Each log file has a corresponding index
file. The index files simply list the names of the log files on disk.
When logs are added or removed, MySQL updates the appropriate index
file.

You can add settings to my.cnf to specify the
log index filenames and locations:

log-bin-index = /var/db/logs/log-bin.index
relay-log-index = /var/db/logs/relay-log.index

Never change these settings once a slave is configured and
replicating. If you do, MySQL uses the new values when it is
restarted and ignores the older files.


7.3.3.3 Status files


MySQL 3.23 and 4.0 slaves use a file
named master.info to store information about
their master. The file contains the master's
hostname, port number, username, password, log file name, position,
and so on. MySQL updates the log position and log file name (as
necessary) in this file as it reads queries from the
master's binary log. While you should never need to
edit the file, it's worth knowing what it is used
for.

The master-info-file option in
my.cnf can be used to change the name and
location of the master.info file:

master-info-file = /home/mysql/master-stuff.info

However, there's rarely a need to do so.

MySQL 4.0 slaves use an additional status file for the SQL thread to
track its processing of the relay log, in much the same way the
master.info file is used. The
relay-log-info-file setting can be used to change
the filename and path of this file:

relay-log-info-file = /home/mysql/logs/relay-log.info

Again, you won't need to change the default.


7.3.3.4 Filtering


There
may be times when you don't need to replicate
everything from the master to the slave. In such
situations you can use the various replication filtering options to
control what is replicated. This is well covered in the MySQL
documentation, so we'll just recap the important
parts.

There are two sets of configuration options for filtering. The first
set applies to the binary log on the master and provide per-database
filtering:

binlog-do-db=dbname
binlog-ignore-db=dbname

Any queries filtered on the master aren't written to
its binary log, so the slave never sees them either.

The second set of options applies to the relay log on the slave. That
means the slave still has to read each query from the
master's binary log and make a decision about
whether or not to keep the query. The CPU overhead involved in this
work is minimal, but the network overhead may not be if the master
records a high volume of queries.

Here is the second set of options:

replicate-do-table=dbname.tablename
replicate-ignore-table=dbname.tablename
replicate-wild-do-table=dbname.tablename
replicate-wild-ignore-table=dbname.tablename
replicate-do-db=dbname
replicate-ignore-db=dbname
replicate-rewrite-db=from_dbname->to_dbname

As you can see, the slave options are far more complete. They not
only offer per-table filtering but also allow you to change the
database or table names on the fly.


/ 105