Recipe 2.11. Archiving a Tree of Files into a Compressed tar File
Credit: Ed Gordon, Ravi Teja
Bhupatiraju
Problem
You need to archive all of the files and folders in a subtree into a
tar archive file, compressing the data with
either the popular gzip approach or the
higher-compressing bzip2 approach.
Solution
The Python
Standard Library's tarfile module
directly supports either kind of compression: you just need to
specify the kind of compression you require, as part of the option
string that you pass when you call
tarfile.TarFile.open to create the archive file.
For example: import tarfile, os
def make_tar(folder_to_backup, dest_folder, compression='bz2'):
if compression:
dest_ext = '.' + compression
else:
dest_ext = ''
arcname = os.path.basename(folder_to_backup)
dest_name = '%s.tar%s' % (arcname, dest_ext)
dest_path = os.path.join(dest_folder, dest_name)
if compression:
dest_cmp = ':' + compression
else:
dest_cmp = ''
out = tarfile.TarFile.open(dest_path, 'w'+dest_cmp)
out.add(folder_to_backup, arcname)
out.close( )
return dest_path
Discussion
You can pass, as argument compression to function
make_tar, the string 'gz' to get
gzip compression instead of the default
bzip2, or you can pass the empty string
'' to get no compression at all. Besides making
the file extension of the result either .tar,
.tar.gz, or .tar.bz2, as
appropriate, your choice for the compression
argument determines which string is passed as the second argument to
tarfile.TarFile.open: 'w', when
you want no compression, or 'w:gz' or
'w:bz2' to get two kinds of compression. Class tarfile.TarFile offers several other
classmethods, besides open,
which you could use to generate a suitable instance. I find
open handier and more flexible because it takes
the compression information as part of the mode
string argument. However, if you want to ensure
bzip2 compression is used unconditionally, for
example, you could choose to call classmethod
bz2open instead. Once we have an instance of class tarfile.TarFile
that is set to use the kind of compression we desire, the
instance's method add does all we
require. In particular, when string folder_to_backup
names a "directory" (or folder),
rather than an ordinary file, add recursively adds
all of the subtree rooted in that directory. If on some other
occasion, we wanted to change this behavior to get precise control on
what is archived, we could pass to add an
additional named argument recursive=False to
switch off this implicit recursion. After calling
add, all that's left for function
make_tar to do is to close the
TarFile instance and return the path on which the
tar file has been written, just in case the
caller needs this information.
See Also
Library Reference docs on module
tarfile. |