Using tar
The lowest common denominator of tape backups
in the Linux and UNIX world is tar the same program that's used to create
archives for grouping multiple files into a single file for easier storage or
transmission over the Internet. In fact, the name tar stands for tape archive. You can use tar as part of
a backup procedure for a network, using either a client-initiated or a
server-initiated strategy. Similar procedures apply to many other common Linux
backup programs, such as cpio and dump , but the details of the commands you use with them will of course
differ, so you'll need to look up the details for how to handle certain
network-related options, particularly for client-initiated backups. I cover tar here both
because it's the lowest common denominator and because it's used by several
other tools, such as smbtar and AMANDA.
Basic tar Features
The tar utility is extremely
powerful and supports a large number of options. These options come in two
forms: commands and qualifiers. Commands tell tar what to dofor
instance, create an archive, list the contents of an archive, or extract files
from an archive. Qualifiers modify the action of commandsthey're used to
specify the device or file tar uses, to limit the files that might be backed up, to compress the
resulting archive with gzip or bzip2 , and so on. When running tar , the basic syntax is: tar command [ qualifiers ] filenames
The filenames you specify are actually often
directory names, possibly including the root directory ( / ). When you
specify a directory name, tar backs up all the files and subdirectories in that directory.href="http:// /JVXSL.asp?x=1&mode=section&sortKey=insertDate&sortOrder=desc&view=&xmlid=0-201-77423-2/ch17lev1sec3&open=true&title=New%20This%20Week&catid=&s=1&b=1&f=1&t=1&c=1&u=1#ch17table01#ch17table01"> Tables 17.1 and href="http:// /JVXSL.asp?x=1&mode=section&sortKey=insertDate&sortOrder=desc&view=&xmlid=0-201-77423-2/ch17lev1sec3&open=true&title=New%20This%20Week&catid=&s=1&b=1&f=1&t=1&c=1&u=1#ch17table02#ch17table02"> 17.2 list some of
the more common tar commands and qualifiers. These are only a sample, however,
particularly for qualifiers. You should consult the tar man page
for information on more options.
Table 17.1. Common tar Commands
Command
Abbreviation
Purpose
--create
c
Creates an archive.
--concatenate
A
Adds a tar file to an
existing archive.
--append
r
Adds ordinary files to an existing archive.
--update
u
Adds ordinary files that are newer than those in the existing
archive.
--diff or --compare
d
Compares archived files to those on disk.
--list
t
Displays contents of an archive.
--extract or get
x
Copies files out of an archive.
Table 17.2. Common tar Qualifiers
Command
Abbreviation
Purpose
--absolute-paths
P
Keeps the leading / on filenames.
--bzip2
I
Passes the archive through bzip2 . (Not available on older versions of tar .)
--directory dir
C
Changes to the specified directory before acting.
--exclude file
(none)
Blocks file from being
backed up.
--exclude-from file
X
Blocks all files listed in file from being backed up.
--file [host:]file
f
Performs backup using file on host as the archive file. (The
host option is
used in client-initiated network backups.)
--gzip or ungzip
z
Passes the archive through gzip or ungzip .
--listed-incremental=file
g
Creates or uses an incremental backup file.
--multi-volume
M
Processes a multi-tape archive.
--one-file-system
l
Backs up or restores just one filesystem.
--same-permissions or --preserve-permissions
p
Preserves all username and permission information.
--tape-length N
L
Specifies the length of a tape in kilobytes; used in conjunction
with multi-volume .
--verbose
v
Displays filenames as they're processed.
--verify
W
Compares original files to archive immediately after writing it.
As an example of these options in use,
suppose a computer has a SCSI tape drive, which can be accessed as /dev/st0 or /dev/nst0 . You
could back up the /home directory of this computer, preserving all permissions and
displaying the filenames as they're backed up, with the following command: # tar --create --verbose --file /dev/st0 /home
The abbreviations shown in href="http:// /JVXSL.asp?x=1&mode=section&sortKey=insertDate&sortOrder=desc&view=&xmlid=0-201-77423-2/ch17lev1sec3&open=true&title=New%20This%20Week&catid=&s=1&b=1&f=1&t=1&c=1&u=1#ch17table01#ch17table01"> Tables 17.1 and href="http:// /JVXSL.asp?x=1&mode=section&sortKey=insertDate&sortOrder=desc&view=&xmlid=0-201-77423-2/ch17lev1sec3&open=true&title=New%20This%20Week&catid=&s=1&b=1&f=1&t=1&c=1&u=1#ch17table02#ch17table02"> 17.2 allow for a
somewhat more succinct variant of this command: # tar cvf /dev/st0 /home
A few tar options deserve
special discussion. These are --one-file-system , same-permissions , --listed-incremental , and --verify . The --one-file-system option is particularly useful for backups because Linux systems may
include virtual filesystems (such as /proc ), removable media,
and perhaps even regular filesystems that should not be backed up. Using
--one-file-system forces tar to back up only the directories or
files you specify, so when you use this option, you should list all the
partitions you want to back up. Alternatively, you could omit --one-file-system and use --exclude or --exclude-from to explicitly block directories such as /proc from being backed up.The --same-permissions
option is particularly important when backing up system files because tar sometimes loses certain permissions,
particularly those that are not allowed by the current umask value. This option
is important when restoring files, but not when backing them up.The --listed-incremental
option creates or uses a file that records information on the files that tar backs up. The first time the program
is run with this option, the specified file is created and all files are backed
up. Subsequent uses of this option cause only files that have been added or
changed since the last backup to be backed up. This allows tar to create a partial
backup, which is much smaller than the regular full
backup. Many administrators perform full backups every week or month, and
partial backups on a daily basis. This provides good protection against
disaster with minimal effort. (When restoring incremental backups, though, you
may find files you've intentionally deleted have been restored, because the
increment procedure doesn't mark files deleted since the last backup as
deleted.) In a network environment, you may want to rotate which machines
receive full backups on any given dayfor instance, machine1 on Monday, machine2
on Tuesday, and so on.Finally, --verify
is intended to check the accuracy of your backup. The verify pass will increase
backup time substantially, but it may be worthwhile, particularly if your tape
drive doesn't include its own verify feature. (Most mid-range and high-end
drives do include verification in hardware, often referred to as read-after-write. ) Any verification performed using --verify or on a second pass using the --diff command is likely to turn up some
false alarms, because Linux systems are constantly active, so some files are
likely to change between the backup and verify passes. Log files, files in /tmp , spool files such as mail and printer
queues, and perhaps user files are particularly likely to change. If you only
see a few changes in files that might reasonably have changed during the
backup, there's no cause for alarm. If you see changes in other files,
particularly in static files such as the contents of /usr , then it's possible that your tape,
tape drive, or network connections are at fault.Most modern tape drives support built-in compression, so there's
no need to use the --bzip2 or --gzip options. Indeed, these options are
potentially dangerous even on the low-end drives that lack compression
features. The reason is that tar
uses gzip or bzip2 to compress an entire archive, not
individual files. If an error occurs when reading back a compressed archive, tar won't be able to recover, so all the
data in the archive after that point will be lost. Tape drives' built-in
compression algorithms are more robust against such errors; in the event of an
error, you're likely to lose a file or two, but not the entire archive. Some
backup programs don't use compression in this way, and so are more robust
against errors. For instance, the commercial BRU (href="http://www.tolisgroup.com" target="_blank">http://www.tolisgroup.com )
package uses file-by-file compression when compression is enabled.
Testing
Local tar and Tape Functions
When setting up a backup server, you should test basic backup
functions locally before introducing the network into the equation. Local
backups are invariably simpler than are network backups, so if you know that
local backups work, you can be reasonably confident in attributing problems
with network backups to the network configuration. In addition, it's important
to remember to back up the backup server itself; like any other computer, it
can fail, and if it fails without a backup, the rest of your network will be at
risk.The most basic local test is to try backing up using a command
like the one presented in the previous section. The trickiest part of this is
in determining the correct device file to use. Four device files are common for
mid-range and high-end tape devices: /dev/st0 ,
/dev/nst0 , /dev/ht0 , and /dev/nht0 . The first two refer to SCSI tape drives, and the
second two refer to EIDE/ATAPI devices. The filenames whose names begin with n are nonrewinding
deviceswhen an operation completes, the driver leaves the tape wound, so you
can place multiple backups on a single tape. Device filenames without the
leading n refer to rewinding devices, which automatically rewind the
tape after every operation. Note that this is a characteristic of the device file, not of the hardware; every tape device
has both a rewinding and a nonrewinding device file. If you have multiple tape
drives, the second will have a filename that ends in 1 instead of 0 , the third's filename will end in 2 , and so on.There are a few exotic hardware types that use other device
filenames. For instance, some older tape drives interfaced through the floppy
port and used device filenames like /dev/qft0
and /dev/nqft0 . Such drives are
very low in capacity and slow by today's standards, and so are unsuitable for
network backups. Other drives use specialized interface hardware. Check the
Linux kernel configuration for drivers for such boards.If you have problems with a local backup, check your device
hardware and check the drivers for the device. SCSI drives need both basic SCSI
support and SCSI tape support enabled. Likewise, EIDE/ATAPI drives need both
EIDE support and EIDE/ATAPI tape support. Be sure to check your ability to both
back up and restore data; try using a small test directory, then a larger one.
Use a verify function to confirm that your data are being recovered correctly.Particularly if you want to place multiple backups on a single
tape, the mt utility may be
useful. This tool lets you control the tape drive, setting options such as its
built-in compression and moving among various backup sets stored on the tape.NOTE

The mt man
page refers to backup sets as files, and tar documentation often does the same.
Think of the tape as a hard disk without a filesystem; your backups are
really just tar files stored
sequentially on the tape, hence this terminology.
You may want to experiment with tar and mt
to place multiple backups on a tape using a nonrewinding tape device. The basic
syntax for mt is as follows: mt [-f device ] operation [ count ] [ arguments ]
The operation
is a command like fsf (forward
space files), bsf (backward space
files), rewind (rewind tape),
and datcompression (set
compressionsend an argument of 0
to disable compression or anything else to enable it). For instance, the
following string of commands creates two backups and then verifies them: # tar cvplf /dev/nst0 testdir-1/ # tar cvplf /dev/nst0 testdir-2/ # mt -f /dev/nst0 rewind # tar df /dev/nst0 testdir-1/ # mt -f /dev/nst0 fsf 1 # tar df /dev/nst0 testdir-2/
Most of these commands should be followed by tape activity.
The first two tar commands will
show the names of the files being backed up, and the last two tar commands will show the names of any
files that differ between the original and the backup. The second mt command is needed when reading back the
archives, but not when creating them.
Performing
a Client-Initiated Backup
A client-initiated backup using tar requires that the client have a tar program and that the backup server be
running an appropriate server program to grant the client's tar program access to the tape device. There's
little special that you must do on the client side, aside from changing the tar commands from those described earlier.
The backup server's configuration isn't the standard one in most Linux
distributions, though, so you'll have to reconfigure the backup server.
Client-Initiated
Network Configurations
The --file
option shown in href="http:// /JVXSL.asp?x=1&mode=section&sortKey=insertDate&sortOrder=desc&view=&xmlid=0-201-77423-2/ch17lev1sec3&open=true&title=New%20This%20Week&catid=&s=1&b=1&f=1&t=1&c=1&u=1#ch17table02#ch17table02"> Table 17.2 takes a filename as an option. This
may be a regular disk file, a device file that corresponds to a tape device, or
a path to a network resource. In this final case, the backup server must be
running the rshd daemon (which
is often called in.rshd ). This
daemon allows a remote system to execute commands on the system on which the
server runs. The tar program
uses this ability to pass the tar
file it creates to a device file on the backup server. The rshd server comes with most Linux systems
and is usually run from a super server. An /etc/inetd.conf
entry to handle this server might resemble the following: shell stream tcp nowait root /usr/sbin/tcpd \ /usr/sbin/in.rshd -h
If your system uses xinetd ,
you would need to create an equivalent entry in /etc/xinetd.conf , or a dedicated startup file in /etc/xinetd.d , as described in href="http:// /?xmlid=0-201-77423-2/ch04#ch04"> Chapter 4 , Starting Servers. A xinetd configuration probably wouldn't
call TCP Wrappers ( /usr/sbin/tcpd ),
but in either case, the security provided by TCP Wrappers or directly by xinetd is important. The rshd daemon relies almost exclusively on
the caller's IP address for security. Although TCP Wrappers and xinetd provide similar access control
mechanisms, the redundancy on this matter can be important in case of a
security bug in rshd .Although IP addresses are the strongest type of access control
used by rshd , the server also
uses usernames to control remote access in order to prevent ordinary users from
running dangerous programs with undue authority on the server. Ordinarily, rshd won't accept commands from root on any remote system. The -h parameter to rshd , demonstrated in the preceding inetd.conf entry, changes this default.
This is extremely important because backups of system files must ordinarily be
run with root privileges in
order to back up sensitive system files and all user files, depending upon your
system's user file permissions. If you omit -h ,
ordinary users will be able to perform backups to the server, but only if the
permissions for the device file on the server allow this. (Most distributions
don't allow ordinary users to access tape device files in any meaningful way.) WARNING

The -h option
to rshd is broken or disabled
on some systems, so this procedure won't work. You may be able to use SSH
insteadrun an SSH server on the backup server, and link ssh on the backup client to the rsh name so that tar calls ssh to do the network transfer. This has security
advantages even for systems on which rshd
works as described. This will only work if you configure SSH to accept logins
without requiring a password authentication, though, as described in href="http:// /?xmlid=0-201-77423-2/ch13#ch13"> Chapter 13 , Maintaining Remote Login
Servers.
Because of the security issues surrounding rshd and its required configuration, the
best configuration for a client-initiated backup server of this type is to
dedicate a computer to this function. Such a computer need not be very
powerful, aside from having a tape backup unit and a fast network connection.
It should be protected from the Internet at large by a firewall, and ideally it
shouldn't contain any vital data or run servers aside from rshd and any others needed for its
configuration.
Performing
the Backup
Once you've set up a backup server, you can perform backups with
it. To do so, you must insert a tape into the backup server's tape drive and
issue a command similar to the following on the backup client: # tar cvlpf buserver:/dev/st0 /home /var /
This command backs up the /home ,
/var , and / directories on the current system to the
rewinding tape device on buserver ,
and excludes any mounted filesystems other than those explicitly specified. If
the three specified directories are the only ones on the computer, this command
performs a complete network backup of the client.You can use the same type of addressing with mt as you can with tar to specify a network backup device.
For instance, mt -f buserver:/dev/nst0
rewind will rewind the tape in buserver 's
tape drive.In sum, performing a client-initiated network backup using tar is very much like performing a local
backup using tar . You must add
the name of the backup server to the device specification, but otherwise the
commands used are identical. The extra effort goes into configuring the backup
server system.
Performing
a Server-Initiated Backup
Server-initiated backups, as described earlier, have the
advantage of allowing a central server to control the scheduling of backups.
This type of setup places the bulk of the configuration details on the backup
client, which must run an appropriate network server package. This section
describes using the Network Filesystem (NFS) server, as covered in href="http:// /?xmlid=0-201-77423-2/ch08#ch08"> Chapter 8 , File Sharing via NFS, to perform
network backups. Once the client is configured, the actual backup operation is
much like a local one, although you must mount the backup client's export on
the backup server system in order to perform the backup.NOTE

It's possible to use a file-sharing protocol other than NFS
for network backups. In fact, the upcoming section, "href="http:// /?xmlid=0-201-77423-2/ch17lev1sec4#ch17lev3sec7"> Using smbmount ," describes using smbmount to back up Windows file shares.
For backing up a Linux system in this way, a protocol that preserves Linux
file ownership and permission information is a practical necessity; hence,
NFS is a good choice.
Server-Initiated Network
Configurations
You should read href="http:// /?xmlid=0-201-77423-2/ch08#ch08"> Chapter 8 to learn how to configure a Linux
computer to export specified filesystems. To perform a complete backup of a system, you must configure that
system to allow the backup server to mount all of its important disk
filesystems. You can omit /proc ,
removable media you don't want to back up, and so on. Ordinarily, you'll
configure the backup client to export all its hard disk partitions.For backup purposes, the backup client may export all
directories with read-only access; the backup server doesn't need to write to
these directories. If you need to restore data, though, you'll need to change
this configuration to allow write access to the relevant directories.
Alternatively, you could use some more convoluted method of restoring data,
such as restoring it to a directory on the backup server, which you can then
export for the backup client to read; or you could use a client-initiated
restore if you configure the backup server appropriately.One potentially dangerous requirement of a server-initiated
backup configuration is that the backup server's root user must have full root
access rights on the backup clientin other words, you must use the no_root_squash option when you define
exports. Without this option, the backup server won't be able to read many
important system files, and perhaps not many users' files, either. This
requirement allows miscreants with local network access or who can spoof the
backup server's address to read all the files on the backup client, and even
modify those files if you export client directories using read-write mode. For
this reason, you should protect the backup server and all its clients with a
good firewall to minimize the risk of outside access, and carefully monitor
logs for evidence of tampering or other abuse.As an example of a configuration, consider a client with three
partitions that should be backed up: /home ,
/var , and / (root). You can export these filesystems
by creating appropriate /etc/exports
entries. If the backup server is called buserver ,
these entries might resemble the following: /home buserver(ro,no_root_squash) /var buserver(ro,no_root_squash) / buserver(ro,no_root_squash)
If you need to restore files, you'll have to change the ro to rw
and restart the NFS server. Another challenge, particularly at restore time, is
keeping file ownership intact. If the backup specifies that a file is owned by,
say, jbrown , and if this name doesn't
map appropriately onto a correct UID, then the ownership of the file may be
lost or mangled. As a general rule, it's simplest if the UIDs associated with
specific users are the same on both the client and the server at both backup
and restore time.
Performing
the Backup
The backup commands are just like those described earlier, but
you must first mount the backup client's exports on the backup server system.
For instance, suppose the backup client is called buclient , and a mount point called /mnt/client exists for holding its backup
directories. You might then mount and back up its files by issuing commands
like the following: # mount -t nfs -o soft buclient:/ /mnt/client # mount -t nfs -o soft buclient:/var /mnt/client/var # mount -t nfs -o soft buclient:/home /mnt/client/home # cd /mnt/client # tar cvlf /dev/st0 home var ./
NOTE

The preceding sequence assumes that the
backup client's NFS server does not export mounted subdirectories. If the NFS
server does export mounted subdirectories, you only need the first mount command.
One point to note about this particular
backup sequence is that it uses cd to change into the main mount point for
the backup client computer. Thus, the view in this directory is of the backup
client's directory tree. The tar command backs up the individual mount points in this directory
tree, but omits the complete path. The result is a tape that includes no
references to the /mnt/client mount point. Files on this tape may be restored by mounting the
target partition at the same mount point or elsewhere and moving into the
mounted directory to do the restore. It's also possible to back up with a
command like the following: # tar cvlf /dev/st0 /mnt/client/home /mnt/client/var /mnt/client
Such a command includes references to the /mnt/client directory (or, more precisely, mnt/client , missing the leading / , unless you
use the --absolute-paths qualifier). Such a backup can therefore only be restored if the
target system is mounted in the same way as at backup, or at least in a directory
that includes a mnt/client subdirectory of its own. Restores lacking such a directory tree
will create onepossibly on the backup server machine rather than the backup
client.WARNING

One potentially serious drawback of this
type of server-initiated backup is that the backup process may stall if the
backup client goes offline during the process. The -o soft mount option used in the preceding example allows the NFS client on the
backup server to return errors to tar , which may be
preferable to a hung backup process.