Cannot read this xml with python?

Question

I know next to nothing about xhtml. And I've got to write a python script to edit a table. But the wiki page I have to edit is for some reason not being read by any of the python xml parsers, and I haven't a clue what's going on. This is a sample page of the wiki. Can anyone tell me what the heck is wrong with this?

<h2>test</h2><p>&nbsp;</p><p><strong>I am a test</strong></p><p>&nbsp;</p><p>Now I need a table</p><table>
<tbody>
<tr>
    <th>name</th>
    <th>column</th>
</tr>
<tr>
    <td>data1</td>
    <td><p>data2</p></td>
</tr>
</tbody>
</table><p>&nbsp;</p><p>&nbsp;</p>

Here's a bit of the code I've been trying to read this with. I've gone through several iterations and different xml parsers, the pulldom, xml.dom, ElementTree, minidom, etc. They're all giving the same exception:

from xml.etree import ElementTree as ET
def main( argv ):
    fileName = "/home/robbnic/Source/scripts/Gesture Service Dashboard.txt"
    text = readFromFile(fileName)
    try:
        for event, elem in ET.iterparse(fileName):
            if elem.tag == "table":
                print "Hot damn!"
                elem.clear()
    except ET.ParseError as pe:
         print pe.message
         print pe.msg
         print pe.args
         print pe.filename
    except:
         print "Unexpected error:", sys.exc_info()[0]
         raise

The exception error I keep getting is unbound prefix, but I know so little about xml (or xhtml in this case) that I just don't know what's going on.

Can you include some of the code that's causing the problem? — Trevor
– Trevor, Commented Jul 20, 2012 at 3:19
Ah, it's supposed to be xhtml from a confluence site. Lemme edit my post! — user1527741
– user1527741, Commented Jul 20, 2012 at 3:19
What do you mean by "edit a table"? Are you looking to change the original XHTML code within the document (inserting or replacing code), or are you looking to access the value of a particular node and manipulate the stored value in memory? (like for web scraping) — abought
– abought, Commented Jul 20, 2012 at 3:24
I mean I'll need to edit the contents of the cells, or add new rows or columns based on an input text file full of data. — user1527741
– user1527741, Commented Jul 20, 2012 at 3:25

Ignacio Vazquez-Abrams · Accepted Answer · 2012-07-20 03:23:35Z

2

You're missing a single root tag. You cannot have multiple roots as yours does (i.e. the h2, ps, table, etc.).

answered Jul 20, 2012 at 3:23

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user1527741 Over a year ago

What, you mean I just kind of stick something on there that says <root> ?

Ignacio Vazquez-Abrams Over a year ago

Sure. The exact tag name isn't terribly important, so long as it's valid XML. And don't forget to close it after.

Collectives™ on Stack Overflow

Cannot read this xml with python?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related