Monthly Archives: August 2011

Merging PDF Files with Python

I think of at least one good topic a day for a blog post but I never seem to get around to documenting them. So here is a recent discovery in Python & PDF.

I was tasked with going through a directory tree (sound familiar ? I have been do this sort of thing a lot lately), finding all PDF files and merging them into one large PDF file. The first thought was: what kind of Adobe product do we need for that. Then, I said: let me try that in Python. That in fact, turned out to be very straightforward. Simply download pyPdf, and take a look at the example there. No needc for Adobe.


import pyPdf
import os

startDir = "c:/temp"
os.chdir(startDir)

fileList = os.listdir(startDir)
output = pyPdf.PdfFileWriter()

for item in fileList:
if os.path.splitext(item)[1].upper() == ".PDF":
pdfDocument = os.path.join(startDir,item)
input1 = pyPdf.PdfFileReader(file(pdfDocument, "rb"))
for page in range(input1.getNumPages()):
output.addPage(input1.getPage(page))

outputStream = file("MyNewOutput.pdf", "wb")
output.write(outputStream)
outputStream.close()

Just replace “c:\temp” with the directory you want to pull the PDF from and you’re in business. If I combine this with the recent script I wrote to explore directory structure, you can drill down through the directory tree searching for and merging PDF files.

Leave a comment

Filed under Uncategorized

ESRI UC 2011 – First Time Attending

This was the first year that I made it out to the ESRI User Conference in San Diego. As expected, the sheer size of the event was a little overwhelming. While I had spent some time looking over the agenda to see what I might be interested in, once inside the Convention Center, it felt like I was spending more time running from session to session than actually sitting still. There is definitely a lot of resources to be found at the UC but you have got to pick out the goodies. I thought some of the most useful material came from the “tips and tricks” types technical sessions while the audience size for something like “How to implement Enterprise Solution …” made for a less fruitful learning experience. Most of all, I enjoyed meeting a bunch of fellow GIS folks, e.g. at the the PUG Social. Looking forward to 2012 !

Leave a comment

Filed under Uncategorized

Exploring Directory Structure [PYTHON]

So,here is a recent Python script I cobbled together. If you’re more of a programmer than myself, you will probably scream at some of my syntax. Do I really need all the ‘globals’ ? I have a sense there a more Pythonic (succinct) ways of accomplishing the same result. But this worked.

The idea was to type in a starting directory and a name for a text file, after which the script recursively drills down through the directory tree, printing file names, examining file extensions, counting files and file size, and then spits out totals, plus – for a visual aid –  prints a quick and dirty histogram. So let me know what to improve next time around.


from __future__ import division

### Cobbled Together by Arne, July 2011
### using the MyOutput() bits by xiao, from
### http://tech.xster.net/tips/python-log-stdout-to-file/

import os, sys

class printToFile():
''' directs print output (stdout) to textfile'''
def __init__(self, logfile):
self.stdout = sys.stdout
self.log = open(logfile, 'w')

def write(self, text):
self.stdout.write(text)
self.log.write(text)
self.log.flush()

def close(self):
self.stdout.close()
self.log.close()

def dictHistogram(extCount):
''' Creates simple histogram based on key-value entries in
file extension/frequency dictionary (extCount) '''
vcount = 0
kcount = 0
for k,v in extCount.iteritems():
vcount = vcount + v

for k,v in extCount.iteritems():
#print v
share = round((v/vcount) * 100,2)
print v, "\t", "File Type ", k, "\t", int(share)*"#", share, "% of total"

def displayFileInfo(entryPath):
'''Displays file name, size when exploreSub() encounters an entry
that is not a directory. New extensions are added to the extensions
list, files are counted in totalFiles, and file size is added up
in totalSize'''
global extensions
global totalSize
global fileCount
global totalFiles
global extCount

print "\t",os.path.basename(entryPath)
ext = os.path.splitext(entryPath)[1].upper()

totalSize = totalSize + os.path.getsize(entryPath)

if ext not in extensions:
extensions.append(ext)
extCount[ext] = 1

else:
a = extCount[ext]
a = a + 1
extCount[ext] = a

def exploreSub(dirEx):
''' Drills down into a file tree, starting with dirEx. If an empty
directory is encountered, function breaks from loop, if a file is
encountered, the file's path is added to a list of files to examined
later, and if a non-empty directory is encountered, exploreSub is
recursively drills down to the next level. Once all directories have
been explored, the files in the list are examined one at a time.'''

global files
global fileCount
print
print dirEx

if os.listdir(dirEx) == []:

print "Is an empty directory."
print

else:
for entry in os.listdir(dirEx):
entryPath = os.path.join(dirEx,entry)
if os.path.isdir(entryPath):
print "\t", entry, "<Dir>"
else:
displayFileInfo(entryPath)

for entry in os.listdir(dirEx):
entryPath = os.path.join(dirEx,entry)
if os.path.isdir(entryPath):
try:
exploreSub(entryPath)
except:
Print "Unable to open ", entryPath

extensions = []
files = []
extCount = {}
fileCount = 0
totalFiles = 0
totalSize = 0

start = raw_input("Enter Directory: ")
log = raw_input("Enter Name for Logfile: ")

sys.stdout = printToFile(log)
exploreSub(start)

print
print "Total Number of Files ", totalFiles
print "Total File Volume ", round(totalSize/1048576), " MB"
print "File Types Encountered: "
print extensions
print

dictHistogram(extCount)

2 Comments

Filed under Uncategorized

First 2 Months as GIS Analyst…

As mentioned back in June, I started a new job as GIS Analyst for an oil & gas (E/P) company, and I’ve been having a blast. After straddling the worlds of GIS and engineering geology for a few years, I’m glad to finally get to focus on GIS. And there is no lack of challenges in the new job. The learning curve has been a joy!

The GIS department I’m part of has been tasked to build an enterprise GIS using ArcGISServer, ArcSDE, SQLServer 2008, and MS Silverlight. Moreover, we are working on a Master Data Management Solution that will allow data from multiple departments to be available through the Silverlight powered web app.

For me, the challenge, therefore, has been and continues to be two fold:

1) Brush up on many aspects of ESRI technology:

ArcGIS Server, ArcSDE, Python. Silverlight

2) Learn about a whole range of oil/gas industry software applications:

IHS Petra, SMT Kingdom, NeuraSection, Landworks, Merrick/RIO, the list continues

and how they might become part of the MDMS using NeuraDB, Volant, FME

I guess you could say we got our work cut out for us.

Leave a comment

Filed under Uncategorized