Recipe 2.16. Walking Directory Trees
Credit: Robin Parmar, Alex Martelli
Problem
You need to examine a
"directory", or an entire directory
tree rooted in a certain directory, and iterate on the files (and
optionally folders) that match certain patterns.
Solution
The generator os.walk
from the Python Standard Library module os is
sufficient for this task, but we can dress it up a bit by coding our
own function to wrap os.walk:
import os, fnmatch
def all_files(root, patterns=''*'',
single_level=False, yield_folders=False):
# Expand patterns from semicolon-separated string to list
patterns = patterns.split('';'')
for path, subdirs, files in os.walk(root):
if yield_folders:
files.extend(subdirs)
files.sort( )
for name in files:
for pattern in patterns:
if fnmatch.fnmatch(name, pattern):
yield os.path.join(path, name)
break
if single_level:
break
Discussion
The standard directory tree traversal generator
os.walk is powerful, simple, and flexible.
However, as it stands, os.walk lacks a few
niceties that applications may need, such as selecting files
according to some patterns, flat (linear) looping on all files (and
optionally folders) in sorted order, and the ability to examine a
single directory (without entering its subdirectories). This recipe
shows how easily these kinds of features can be added, by wrapping
os.walk into another simple generator and using
standard library module fnmatch to check filenames
for matches to patterns.The file patterns are possibly case-insensitive
(that''s platform-dependent) but otherwise
Unix-style, as supplied by the standard fnmatch
module, which this recipe uses. To specify multiple patterns, join
them with a semicolon. Note that this means that semicolons
themselves can''t be part of a pattern.For example, you can easily get a list of all Python and HTML files
in directory /tmp or any subdirectory thereof:
thefiles = list(all_files(''/tmp'', ''*.py;*;*l''))Should you just want to process these files'' paths
one at a time (e.g., print them, one per line), you do not need to
build a list: you can simply loop on the result of calling
all_files:
for path in all_files(''/tmp'', ''*.py;*;*l''):If your platform is case-sensitive, alnd you want case-sensitive
print path
matching, then you need to specify the patterns more laboriously,
e.g., ''*.[Hh][Tt][Mm][Ll]'' instead of
just ''*l''.
See Also
Documentation for the os.path module and the
os.walk generator, as well as the
fnmatch module, in the Library
Reference and Python in a
Nutshell.