Merging PDF Files with Python

I think of at least one good topic a day for a blog post but I never seem to get around to documenting them. So here is a recent discovery in Python & PDF.

I was tasked with going through a directory tree (sound familiar ? I have been do this sort of thing a lot lately), finding all PDF files and merging them into one large PDF file. The first thought was: what kind of Adobe product do we need for that. Then, I said: let me try that in Python. That in fact, turned out to be very straightforward. Simply download pyPdf, and take a look at the example there. No needc for Adobe.


import pyPdf
import os

startDir = "c:/temp"
os.chdir(startDir)

fileList = os.listdir(startDir)
output = pyPdf.PdfFileWriter()

for item in fileList:
if os.path.splitext(item)[1].upper() == ".PDF":
pdfDocument = os.path.join(startDir,item)
input1 = pyPdf.PdfFileReader(file(pdfDocument, "rb"))
for page in range(input1.getNumPages()):
output.addPage(input1.getPage(page))

outputStream = file("MyNewOutput.pdf", "wb")
output.write(outputStream)
outputStream.close()

Just replace “c:\temp” with the directory you want to pull the PDF from and you’re in business. If I combine this with the recent script I wrote to explore directory structure, you can drill down through the directory tree searching for and merging PDF files.

Advertisements

Leave a comment

Filed under Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s