Python XML response parsing having nested tags

Question

Have a response from backend api which is giving me the below response.I want to extract out the pid data "1664953412.79414"

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:s="http://dev.splunk.com/ns/rest" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" shp_request_proxied_from="3DB91F64-892E-4DB2-9271-C5CB5CAFBFBB">
    <title>jobs</title>
    <updated>2022-10-05T10:48:30-07:00</updated>
    <author>
        <name>Splunk</name>
    </author>
    <opensearch:totalResults>1</opensearch:totalResults>
    <entry>
        <published>2022-10-05T00:03:34.000-07:00</published>
        <author>
            <name>abc-pull</name>
        </author>
        <content type="text/xml">
            <s:dict>
                <s:key name="pid">1664953412.79414</s:key>
            </s:dict>
        </content>
    </entry>
</feed>

I have tried various approaches but I am not able to extract out the data.

from xml.dom import minidom
pid = minidom.parseString(response.text).getElementsByTagName('pid')[0].childNodes[0].nodeValue

ThenI tried like this

import xml.etree.ElementTree as ET
root = ET.fromstring(response.text)
print(root.tag)
print(root.find('entry'))

But not getting entry tag data also properly Can someone please help here. Note :- I cannot use xmltodict as thats not available in my enterprise packages

Md. Fazlul Hoque · Accepted Answer · 2022-10-06 11:01:13Z

1

Y ou can use BeautifulSoup to pull the text node value of tag s:key along with attr name="pid" because it's super powerful to parse html and xml DOM contents.

xml_doc = '''
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:s="http://dev.splunk.com/ns/rest" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" shp_request_proxied_from="3DB91F64-892E-4DB2-9271-C5CB5CAFBFBB">
    <title>jobs</title>
    <updated>2022-10-05T10:48:30-07:00</updated>
    <author>
        <name>Splunk</name>
    </author>
    <opensearch:totalResults>1</opensearch:totalResults>
    <entry>
        <published>2022-10-05T00:03:34.000-07:00</published>
        <author>
            <name>abc-pull</name>
        </author>
        <content type="text/xml">
            <s:dict>
                <s:key name="pid">1664953412.79414</s:key>
            </s:dict>
        </content>
    </entry>
</feed>
'''

from bs4 import BeautifulSoup
pid = BeautifulSoup(xml_doc, 'lxml').select_one('s\:key[name="pid"]').text
print(pid)

Output:

1664953412.79414

edited Oct 6, 2022 at 11:01

answered Oct 6, 2022 at 10:23

Md. Fazlul Hoque

16.2k5 gold badges15 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

pythonNinja Over a year ago

@Fazlul can this be done without BeautifulSoup

Md. Fazlul Hoque Over a year ago

As it's pseudo-class ,so xml will not work here. I've tested

Md. Fazlul Hoque Over a year ago

@pythonNinja Alternative is using xpath

pythonNinja Over a year ago

ok so ElementTree cant bed used

Collectives™ on Stack Overflow

Python XML response parsing having nested tags

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related