1

I am trying to write an XML document (shown below) into a pandas dataframe. I am using df = pd.read_xml(doc.xml, xpath='//Generator)' but i keep getting an error saying ValueError: xpath does not return any nodes. Be sure row level nodes are in xpath. If document uses namespaces denoted with xmlns, be sure to define namespaces and use them in xpath.I have also tried adding the xmlns=url and xmlns:xsi=url to namespace= parameters with no luck. I can't seem to figure out what I am doing wrong. Any help would be greatly appreciated.

My XML document looks like this:

<?xml-stylesheet type="text/xsl" href="url"?>
<IMODocument docID="document" xmlns="url" xmlns:xsi="url" xsi:schemaLocation="url">
    <IMODocHeader>
        <DocTitle>
            Generators Output and Capability Report
        </DocTitle>
        <DocRevision>
            4
        </DocRevision>
        <DocConfidentiality>
            <DocConfClass>
                PUB
            </DocConfClass>
        </DocConfidentiality>
        <CreatedAt>
            2021-08-18T10:15:50
        </CreatedAt>
    </IMODocHeader>
    <IMODocBody>
        <Date>
            2021-08-18
        </Date>
        <Generators> //Portion i'm trying to write into a data frame
            <Generator>
            </Generator>
            <Generator>
            </Generator>
            <Generator>
            </Generator>
            <Generator>
            </Generator>
            <Generator>
            </Generator>
            <Generator>
            </Generator>
            <Generator>
            </Generator>
        </Generators> // ----------------end-------------------
    </IMODocBody>
</IMODocument>```
2
  • You can use xmltodict package instead: pypi.org/project/xmltodict. It loads the XML as dictionary and then you can use pd.json_normalize() to convert it to dataframe. Commented Aug 19, 2021 at 15:32
  • How should the DF look like? Commented Aug 19, 2021 at 18:09

1 Answer 1

3

This doesn't seem to have anything to do with namespaces. If you try changing your xpath expression from = to contains(), like this:

df = pd.read_xml(doc.xml, xpath='//*[contains(name(),"Generator")]')

it seems to work, at least for me with your sample xml.

Not sure why this happens; a bug?

Sign up to request clarification or add additional context in comments.

2 Comments

This is weird... It's not formatting how I expected it too. I kind of thought this was going to be as easy as pandas read_csv is or read_excel but this seems much more complex
@ZacharyMcArthur Indeed; read_xml() is brand new, so that may be part of the issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.