XML Parsing with Python (Turning Objects into XML)

What I said at the end of my last post got me thinking about XML, and I have been reading about reading/writing XML with Python. There is a tremendous amount of information out there. But when you’re starting from square 0, it can be confusing because of the number of different XML modules to choose from. I’ll post a few links later. For now, here is a very short piece of code that opens and reads through an XML file, and looks for elements named “POINT”:

from xml.sax.handler import ContentHandler
from xml.sax import make_parser

class gintHandler(ContentHandler):
   
    def startElement(self, name, attrs):
         if name == “POINT”:
              print “Found a Point”
              for (k,v) in attrs.items():
                  print k + ” ” + v
  
parser = make_parser()
parser.setContentHandler(gintHandler())
parser.parse(open(“point.xml”,”r”))

The test XML file I used was a simplified version of my data exported from gINT (see previous posts), and looked like this:

<?xml version=”1.0″?>
<dataroot>
<POINT>
<North>10342486</North>
<East>3259140</East>
</POINT>

<POINT>
<North>10342671</North>
<East>3259434</East>
</POINT>

</dataroot>

OUTPUT:

Found a Point
Found a Point

The code I ended up using has been adapted from a number of sources. Most helpful (easiest) was this Devshed page. But as you might’ve noticed the reference to attrs (attributes) and the two lines iterating through attrs.items() aren’t all that helpful. I learned that XML tags can have attributes, e.g. <POINT Date=”08/22/2010″>, and those 2 lines refer to those attributes. But my XML file doesn’t have any attributes. The Python POINT object’s attributes are in tags that are enclosed in the POINT tags, a child-parent relationship.

<PARENT>
<CHILD></CHILD>
</PARENT>

The best example for this type of structure is that familiar from HTML tables.

<TABLE>
  <TR>
    <TD></TD>
    <TD></TD>
  </TR>
  <TR>
    <TD></TD>
    <TD></TD>
 </TR>
</TABLE>

I found a good code example for I am trying to achieve in O’Reilly’ Python Cookbook (Recipe #12.5 – Converting an XML Document into a Tree of Python Objects) but I had some trouble adapting it to my needs. Recipe 12.7 for parsing Excel XML files was simpler… [to be continued]

Advertisements

Leave a comment

Filed under Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s