I am somewhat new to Python and programming in general so I apologize. By the way, thanks in advance.
I am parsing an xml document (kml specifically which is used in Google Earth) using Python 2.5, cElementTree and expat. I am trying to pull out all the text from the 'name', 'description' and 'coordinates' nodes inside each 'placemark' node for each geometry type (i.e. polylines, polygon, point), but I want to keep the geometry types separate. For example, I want only the 'name','description', and 'coordinates' text for every placemark that is part of a 'polygon' (i.e. it has a 'polygon' node). I will need to do this for 'polylines' and 'points' also. I have figured out a way to do this, but the code is long a verbose and specific to each geometry type, which leads me to my question.
Ideally, I would like to use the same code for each geometry type, but the problem is that each geometry type has a different node structure (i.e. different node names and number of nested nodes). So for proof of concept I thought this would be a good opportunity to use/learn recursion to drill down the node tree of 'placemark' node and get the information I was looking for. I have looked at the many posts on Python recursion and am still having problems with implementing the solutions provided.
The sample xml for a 'placemark' node is:
<Placemark>
<name>testPolygon</name>
<description>polygon text</description>
<styleUrl>#msn_ylw-pushpin</styleUrl>
<Polygon>
<tessellate>1</tessellate>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-81.4065,31.5072,0 -81.41269,31.45992,0 -81.34490,31.459696,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
The recursion function I am using is:
def getCoords( child, searchNode ):
# Get children of node
children = child.getchildren()
# If node has one or more child
if len( children ) >= 1 :
# Loop through all the children
for child in children:
# call to recursion function
getCoords( child, searchNode )
# If does not have children and is the 'searchNode'
elif len( children ) == 0 and child.tag == searchNode:
# Return the text inside the node. This is where it is not working
# Other posts recommended returning the function like
# return getCoords(child, searchNode), but I am getting an unending loop
return child.text
# Do nothing if node doesn't have children and does not match 'searchNode'
else:
print 'node does not have children and is not what we are looking for'
I am calling the recursion function like:
searchNode = 'coordinates'
# loop through all 'Placemark nodes' in document
for mark in placemark:
# Get children of 'Placemark' node
children = mark.getchildren()
# Loop through children nodes
for child in children:
# if a 'Polygon' node is found
if child.tag == 'Polygon':
# call recursion function
getCoords( child, searchNode)
I realize, at least, part of my problem is the return value. Other posts recommended returning the function, which I interpreted to be 'return getCoords(child, searchNode), but I am getting an unending loop. Also, I realize this could be posted on the GIS site, but I think this is more of a general programming question. Any ideas?