Recipe 2.24. Counting Pages of PDF Documents on Mac OS X
Credit: Dinu Gherman, Dan
Wolfe
Problem
You're running on a reasonably recent version of Mac
OS X (version 10.3 "Panther" or
later), and you need to know the number of pages in a PDF document.
Solution
The PDF format and Python are both natively integrated with Mac OS X
(10.3 or later), and this allows a rather simple solution:
#!/usr/bin python
import CoreGraphics
def pageCount(pdfPath):
"Return the number of pages for the PDF document at the given path."
pdf = CoreGraphics.CGPDFDocumentCreateWithProvider(
CoreGraphics.CGDataProviderCreateWithFilename(pdfPath)
)
return pdf.getNumberOfPages( )
if _ _name_ _ == '_ _main_ _':
import sys
for path in sys.argv[1:]:
print pageCount(path)
Discussion
A reasonable alternative to this recipe might be to use the
PyObjC Python extension, which (among other
wonders) lets Python code reuse all the power in the
Foundation and AppKit
frameworks that come with Mac OS X. Such a choice would let you write
a Python script that is also able to run on older versions of Mac OS
X, such as 10.2 Jaguar. However, relying on Mac OS X 10.3 or later
ensures we can use the Python installation that is integrated as a
part of the operating system, as well as such goodies as the
CoreGraphics Python extension module (also part of
Mac OS X "Panther") that lets your
Python code reuse Apple's excellent
Quartz graphics engine
directly.
See Also
PyObjC is at http://pyobjc.sourceforge.net/; information
on the CoreGraphics module is at http://www.macdevcenter.com/pub/a/mac/2004/03/19/core_graphicsl.