High Performance MySQL [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

High Performance MySQL [Electronic resources] - نسخه متنی

Derek J. Balling

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








7.1 Replication Overview


Database replication has an undeserved reputation for being complex
to set up and prone to failure. The early versions of MySQL
replication were difficult to configure because the process was
inadequately documented. In its most basic form, replication consists
of two servers: a master and a
slave. The master records all queries that change data in its
binary log. The slave connects to the master, reads
queries from the master's binary log, and executes
them against its local copy of the data.

Before peering under the hood, let's look at the
types of problems replication does and doesn't
solve. If you're reading this in the hopes of
deploying replication to cure a problem, this section may help you
decide whether you're on the right track.


7.1.1 Problems Solved with Replication


Replication isn't
perfect, but it can be quite useful in solving several classes of
problems in the areas of scalability and backups.


7.1.1.1 Data distribution


Need to maintain a copy of your
data 10,000 miles away? Replication makes it trivial to do so. As
long as you have decent connectivity between two sites, you can
replicate a MySQL database. Think of this as scaling geographically.

In fact, it's possible to use
replication over a
network connection that isn't
"always on," such as traditional
dial-up using PPP. You can simply let the slave fail and reconnect
(it'll keep trying for a long time). Or you can use
one of the SLAVE STOP commands
(described later) to disable the slave's replication
when no connection is available. The master doesn't
mind if a slave disconnects for a few hours and then reconnects. But
you can't let the slave go for too long without
reconnecting to the master, because the older record of changes will
eventually be purged to keep the master from running out of disk
space.

Of course, you can also use replication between two servers that sit
next to each other. Any time you need multiple up-to-date copies of
your MySQL data, replication is often the easiest solution. You can
even replicate data between two MySQL servers on the same machine,
which is often a good way to test a new version of MySQL without
using a second machine.


7.1.1.2 Load balancing


If you use MySQL on a large data
warehousing application or a popular web site, odds are that your
server is running many more read queries (SELECT)
than write queries (INSERT,
UPDATE, and DELETE). If
that's the case, replication is an excellent way to
support basic load balancing. By having one or more slave servers,
you can spread most of the work among several servers.

The trick, of course, is coming up with an effective way to spread
the queries among the available slaves so they get roughly equal
workloads. One simple approach is to use
round-robin
DNS. Assign multiple IP addresses for a hostname such as
db-slave.example.com, and your application will
connect to one at random each time it opens a new connection to
MySQL.[1]

[1] Some operating systems don't
randomize this very well.


A more sophisticated approach involves the same solutions that are
used in web server load balancing. Network load-balancing products from
Foundry Networks, Cisco, Nortel, and others work just as well for
MySQL as they do for web sites.[2] The same is true
of software solutions such as the Linux Virtual Server (LVS) project
(http://www.linuxvirtualserver.org/).

[2] That's not entirely true, as
you'll soon see.


Load-balancing techniques are covered in greater detail in Chapter 8.


7.1.1.3 Backup and recovery


Backing up a busy MySQL server can be
difficult when your clients demand access to the data 24 hours a day.
Rather than deal with the complexity of implementing a backup process
that minimizes the impact on your clients, you might find it easier
simply to configure a slave server and back it up instead. Because
the slave will have an exact copy of the data, it's
just as good as backing up the master. Plus you'll
never impact the clients who are using the master. You might even
decide that you don't necessarily need or want to
back up the data as long as you have the "hot
spare" slave database available in the case of
problems with the master.

Chapter 9 covers backup and recovery techniques
in more detail.


7.1.1.4 High availability and failover


Using replication, you can avoid
making MySQL (or the system hosting it) a single point of failure in
your applications. Although there's no
out-of-the-box, automated failover solution for MySQL, you can
achieve a good degree of high availability using some relatively
simple techniques.

Using a creative DNS setup, you can insulate your applications from
having to know which server is the master and minimize the effort
involved in switching to a slave when the master fails.

Let's suppose you have two MySQL servers,
db1.example.com and
db2.example.com. Rather than hardcoding the name
of the master in your applications, you can set up
db.example.com as a
CNAME (or alias) in
DNS for your master. By using a very low Time To Live (TTL) on the DNS record,
you can ensure that clients will not cache the information longer
than necessary.

In the event your master goes down, simply update your DNS to point
db.example.com at the new master. As soon as the
TTL expires, your applications will pick up the new information and
connect to the proper server. There will be some time during which
the applications can't contact MySQL, but that time
will be relatively brief if you use a low enough TTL.[3]

[3] Be careful not to set it too low, however. The DNS resolvers
shipped with some operating systems have been known to simply ignore
TTLs that are deemed to be too low. When in doubt, test the
implementation before depending on it to work.


If you'd like to
eliminate entirely the need to use DNS, you can play similar games
using IP addresses. Because it's trivial to add and
remove additional IP addresses from a server, a scheme like this may
serve you well:

Use an IP address for each role, such as 192.168.1.1 for the master
and an address in the 192.168.1.100-192.168.120 range for each slave.

Make sure each machine has its own primary IP address in addition to
the role-based IP address.

When the master goes down, any of the slaves can be scripted to pick
up the IP address and immediately take over.

The master should be set so that if it ever loses its master address
or goes down, it doesn't automatically pick up the
address again (i.e., it assumes someone else will).


See the "High Availability" section
of Chapter 8 for more on the topic.


7.1.2 Problems Not Solved with Replication


Replication
doesn't solve every problem. Performance can become
an issue with replication because every slave still needs to execute
the same write queries as the master. In a very write-heavy
application, slaves need to be at least as powerful as the master. If
you attempt to use replication to set up a load-balancing system, you
may be disappointed. It may be more productive to implement a
partitioning system with multiple mastersone for each
partition of the data.

Also, there's no guarantee that a slave will be
completely in sync with the master at any given moment. If the load
on a slave is relatively high, the slave may fall behind and need
time to catch up.

Network bandwidth and latency can also become an issue. If the slave
is far away from the master (in a network sense) and there
isn't sufficient bandwidth, a slave may be able to
keep up with the master's query load, but it
won't be able to get data fast enough to do so.

Let's look at two specific examples that illustrate
problems not easily solved with replication.


7.1.2.1 Real-time data transmission


MySQL's
replication isn't the ideal vehicle for transmitting
real-time or nearly real-time data such as a stock quote feed or an
online auction site. In those applications, it's
important that the user sees up-to-date data no matter which database
server they use.

The only way to combat MySQL's lack of any latency
guarantee is to implement your own monitoring system. It needs to use
some sort of heartbeat to verify that each server has a reasonably
up-to-date copy of the data. In the event that a server falls too far
behind, the monitoring system needs to proactively remove it from the
list of active servers until it can catch up.

Of course, you can also build your application in such a way that it
updates all the slaves with the newest data. However, that can add a
lot of complexity and may not be worth the effort.
You'd end up writing a lot of code to handle the
exceptional conditions, such as when a single server falls behind or
is intermittently inaccessible. Testing and debugging all those
situations can be very time-consuming and difficult.

As Derek went over this, he thought,
"Wouldn't it be cool if MySQL could
provide a query response that signified, `Go ask
another server, I'm really busy right
now'?" This would allow clients to
automatically find willing servers in a multihost DNS rotation.

For example, the client wants to connect to
db.example.com (which is
db1, db2, and
db3). It connects (randomly) to
db2, and the server answers the query with
"I'm busy; go ask someone
else," whereupon the client knows enough to try
db1 or db3. Because the
client library would be connecting to the same virtual server, it
could transparently disconnect from the busy server and connect to
some other (hopefully less busy) server.

As a result, all you would need is some automated way for a slave
server to know how far behind they are and to shut themselves off
from queries when they get too far behind, and you'd
have some protection. Of course, this could also be subject to a
cascading failure. If all the slaves are very busy, the last thing
you'd want is for them to start removing themselves
from the pool of available servers. Continue on to Chapter 8 for a deeper discussion of these issues.


7.1.2.2 Online ordering


An ordering system is different from a
real-time stock quote feed or an auction site in a couple of
important ways. First, the ratio of reads to writes is likely to be
much lower. There isn't a constant stream of users
running read-only queries against the data. Also, when users are
running read queries, they're often part of a larger
transaction, so you can't send those read queries to
a slave. If you did, the slave might not have the correct data yet.
Transactions aren't written to the binary log and
therefore sent to slaves until they first commit on the master. A
slave will contain only committed transactions.

Replication can still be very useful for an order processing system.
It's reasonable to use a slave for running nightly
reports and queries that don't need the most recent
data.


7.1.3 Replication Performance


Having considered the problems that
replication does and doesn't solve, you may still be
a bit unsure about using it. Maybe replication is fast enough to get
the job done, despite the lack of any performance guarantees built
into the system. Wouldn't it be nice to have a
general idea of how fast replication really is?

That's exactly what we wanted to know when we first
began using replicationpartly for our own sanity and partly
because we knew a lot of people would soon be interested in MySQL
replication. The first question they'd ask is,
"How fast is it?" To answer that
question, we devised the following simple test to measure the
practical minimum replication latency in a particular environment.

A Perl script opened two database connections, one to the master and
one to the slave. The master and slave were on the same 100-Mbit
switched Ethernet network. The script then inserted a record into the
master and immediately attempt to retrieve it from the slave. If the
record wasn't available, the script immediately
retried. We kept the records intentionally small, containing just an
auto-increment column and a
VARCHAR field into which we inserted the current
time.

The results were encouraging. Of the 1,000 records inserted, 950 of
them were available on the first attempt. That left 50 records that
required at least a second try. Of those 50, 43 were available on the
second attempt. The remaining 7 were there on the third try. The test
was quick and very unscientific, but it can help to set realistic
expectations.


/ 105