9.4 Rolling Your Own Backup Script
There are always
circumstances in which the standard tools aren't
enough to get the job done. Perhaps they're not
flexible enough, they're too slow, or they just
don't work the way you'd like. The
solution, of course, is to build your own tool. In doing so, you may
decide to use the existing utilities or to just do your own thing.Let's look at writing a simple MySQL backup script
in Perl. While it isn't the most powerful or
flexible script in the world, it can serve as a starting point for
building a custom solution.The script
(mysnap.pl)
solves the following problem. You have a MySQL server that keeps all
its data on a volume with snapshot capabilities. Every 12 hours,
you'd like to perform the following tasks to make a
good snapshot and gather a list of tables and their sizes:Flush and lock all MyISAM tables.Assemble a list of every table and its size.Initiate a snapshot.Unlock the tables.Output the list of table sizes.
The script's output can be captured and
automatically mailed to a backup administrator. A
cron entry like this does the job nicely if
you're using Vixie cron (common
on Linux and FreeBSD):
MAILTO=backup-admin@example.com
00 */12 * * * /usr/local/bin/mysnap.pl
Otherwise, you can use the more traditional format:
00 0,12 * * * /usr/local/bin/mysnap.pl | mail backup-admin@example.com
You'll find the complete script listed here.
#!/usr/bin/perl -w
#
# mysnap.pl - snapshot mysql and mail stats to backup admins
use strict;
use DBIx::DWIW;
$|=1; # unbuffer output
my $db_user = 'backup_user';
my $db_pass = 'backup_pass';
my $db_name = 'mysql';
my $db_host = 'localhost';
my $command = '/usr/local/bin/snapshot';
my $conn = DBIx::DWIW->Connect(DB => $db_name, User => $db_user,
Pass => $db_pass, Host => $db_host);
my @table_sizes;
# flush and lock all tables
$conn->Execute("FLUSH TABLES WITH READ LOCK");
# gather stats on the tables
my @db_list = $conn->FlatArray("SHOW DATABASES");
for my $db (@db_list)
{
$conn->Execute("USE $db") or die "$!";
my @table_info = $conn->Hashes("SHOW TABLE STATUS");
for my $table (@table_info)
{
my $name = $table->{Name};
my $size = $table->{Data_length};
push @table_sizes, ["$db.$name", $size];
}
}
# run the snapshot
system($command);
# unlock the tables
$conn->Execute("UNLOCK TABLES");
$conn->Disconnect;
# sort by size and print
for my $info (sort { $b->[1] cmp $a->[1] } @table_sizes)
{
printf "%-10s %s\n", $info->[1], $info->[0];
}
exit;
_ _END_ _
Let's walk through the basic flow. The first thing
to notice is that the script requires a module from CPAN.
DBIx::DWIW simplifies most Perl work with
MySQL.[7] After
using the necessary modules, define the necessary variables for the
connection to MySQL. Then you execute a FLUSH
TABLES WITH
READ LOCK to make sure all
changes are on disk and that no further changes will happen.[7] The DWIW stands for Do What I Want, a play on
Perl's Do What I Mean.
Once the tables have all been flushed and locked, the script collects
a list of all the databases on the server and iterates through them.
In each database, the script gets the status of all the tables using
SHOW TABLE
STATUS, which produces records that look like
this:
mysql> SHOW TABLE STATUS \G
*************************** 1. row ***************************
Name: journal
Type: MyISAM
Row_format: Dynamic
Rows: 417
Avg_row_length: 553
Data_length: 230848
Max_data_length: 4294967295
Index_length: 5120
Data_free: 0
Auto_increment: NULL
Create_time: 2001-12-09 23:18:06
Update_time: 2002-06-16 22:20:13
Check_time: 2002-05-19 17:03:35
Create_options:
Comment:
The script grabs the Name and
Data_length fields for each table and stores them
in the @table_sizes list. Once that data has been
gathered, the script calls the snapshot command.
Finally, it unlocks the tables and prints the list of tables and
sizes (sorted by size).Running
mysnap.pl
produces output like this:
$ mysnap.pl
9300388448 Datascope.SymbolHistory
1458868716 Chart.SymbolHistory
773481608 logs.pfs
749644404 IDX.LinkLog
457454228 SEC.SEC_Filings
442951712 IDX.BusinessWireArticles
343099968 Datascope.Symbols
208388096 IDX.Headlines
...
As expected, the largest tables are listed firstregardless of
which databases they reside in.There are many ways mysnap.pl can be improved or
enhanced. It could:Perform more error checking.Compare the current table sizes with those from the previous run.Notice whether a table has grown beyond a preset threshold.Ignore Heap tables, since they don't reside on disk.
None of those enhancements are particularly difficult. With even a
basic grasp of Perl and a bit of time, you can transform that script
to something custom-tailored for your needs.