What I said at the end of my last post got me thinking about XML, and I have been reading about reading/writing XML with Python. There is a tremendous amount of information out there. But when you’re starting from square 0, it can be confusing because of the number of different XML modules to choose from. I’ll post a few links later. For now, here is a very short piece of code that opens and reads through an XML file, and looks for elements named “POINT”:
from xml.sax.handler import ContentHandler
from xml.sax import make_parser
def startElement(self, name, attrs):
if name == “POINT”:
print “Found a Point”
for (k,v) in attrs.items():
print k + ” ” + v
parser = make_parser()
The test XML file I used was a simplified version of my data exported from gINT (see previous posts), and looked like this:
Found a Point
Found a Point
The code I ended up using has been adapted from a number of sources. Most helpful (easiest) was this Devshed page. But as you might’ve noticed the reference to attrs (attributes) and the two lines iterating through attrs.items() aren’t all that helpful. I learned that XML tags can have attributes, e.g. <POINT Date=”08/22/2010″>, and those 2 lines refer to those attributes. But my XML file doesn’t have any attributes. The Python POINT object’s attributes are in tags that are enclosed in the POINT tags, a child-parent relationship.
The best example for this type of structure is that familiar from HTML tables.
I found a good code example for I am trying to achieve in O’Reilly’ Python Cookbook (Recipe #12.5 – Converting an XML Document into a Tree of Python Objects) but I had some trouble adapting it to my needs. Recipe 12.7 for parsing Excel XML files was simpler… [to be continued]