I think of at least one good topic a day for a blog post but I never seem to get around to documenting them. So here is a recent discovery in Python & PDF.
I was tasked with going through a directory tree (sound familiar ? I have been do this sort of thing a lot lately), finding all PDF files and merging them into one large PDF file. The first thought was: what kind of Adobe product do we need for that. Then, I said: let me try that in Python. That in fact, turned out to be very straightforward. Simply download pyPdf, and take a look at the example there. No needc for Adobe.
import pyPdf import os startDir = "c:/temp" os.chdir(startDir) fileList = os.listdir(startDir) output = pyPdf.PdfFileWriter() for item in fileList: if os.path.splitext(item).upper() == ".PDF": pdfDocument = os.path.join(startDir,item) input1 = pyPdf.PdfFileReader(file(pdfDocument, "rb")) for page in range(input1.getNumPages()): output.addPage(input1.getPage(page)) outputStream = file("MyNewOutput.pdf", "wb") output.write(outputStream) outputStream.close()
Just replace “c:\temp” with the directory you want to pull the PDF from and you’re in business. If I combine this with the recent script I wrote to explore directory structure, you can drill down through the directory tree searching for and merging PDF files.