Using pandas read_xml to generate a Dataframe

Question

I am trying to write an XML document (shown below) into a pandas dataframe. I am using df = pd.read_xml(doc.xml, xpath='//Generator)' but i keep getting an error saying ValueError: xpath does not return any nodes. Be sure row level nodes are in xpath. If document uses namespaces denoted with xmlns, be sure to define namespaces and use them in xpath.I have also tried adding the xmlns=url and xmlns:xsi=url to namespace= parameters with no luck. I can't seem to figure out what I am doing wrong. Any help would be greatly appreciated.

My XML document looks like this:

<?xml-stylesheet type="text/xsl" href="url"?>
<IMODocument docID="document" xmlns="url" xmlns:xsi="url" xsi:schemaLocation="url">
    <IMODocHeader>
        <DocTitle>
            Generators Output and Capability Report
        </DocTitle>
        <DocRevision>
            4
        </DocRevision>
        <DocConfidentiality>
            <DocConfClass>
                PUB
            </DocConfClass>
        </DocConfidentiality>
        <CreatedAt>
            2021-08-18T10:15:50
        </CreatedAt>
    </IMODocHeader>
    <IMODocBody>
        <Date>
            2021-08-18
        </Date>
        <Generators> //Portion i'm trying to write into a data frame
            <Generator>
            </Generator>
            <Generator>
            </Generator>
            <Generator>
            </Generator>
            <Generator>
            </Generator>
            <Generator>
            </Generator>
            <Generator>
            </Generator>
            <Generator>
            </Generator>
        </Generators> // ----------------end-------------------
    </IMODocBody>
</IMODocument>```

You can use xmltodict package instead: pypi.org/project/xmltodict. It loads the XML as dictionary and then you can use pd.json_normalize() to convert it to dataframe. — Babak Fi Foo
– Babak Fi Foo, Commented Aug 19, 2021 at 15:32

Jack Fleeting · Accepted Answer · 2021-08-19 16:19:58Z

3

This doesn't seem to have anything to do with namespaces. If you try changing your xpath expression from = to contains(), like this:

df = pd.read_xml(doc.xml, xpath='//*[contains(name(),"Generator")]')

it seems to work, at least for me with your sample xml.

Not sure why this happens; a bug?

answered Aug 19, 2021 at 16:19

Jack Fleeting

25k6 gold badges27 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Zachary McArthur Over a year ago

This is weird... It's not formatting how I expected it too. I kind of thought this was going to be as easy as pandas read_csv is or read_excel but this seems much more complex

Jack Fleeting Over a year ago

@ZacharyMcArthur Indeed; read_xml() is brand new, so that may be part of the issue.

Collectives™ on Stack Overflow

Using pandas read_xml to generate a Dataframe

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related