0

Have a response from backend api which is giving me the below response.I want to extract out the pid data "1664953412.79414"

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:s="http://dev.splunk.com/ns/rest" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" shp_request_proxied_from="3DB91F64-892E-4DB2-9271-C5CB5CAFBFBB">
    <title>jobs</title>
    <updated>2022-10-05T10:48:30-07:00</updated>
    <author>
        <name>Splunk</name>
    </author>
    <opensearch:totalResults>1</opensearch:totalResults>
    <entry>
        <published>2022-10-05T00:03:34.000-07:00</published>
        <author>
            <name>abc-pull</name>
        </author>
        <content type="text/xml">
            <s:dict>
                <s:key name="pid">1664953412.79414</s:key>
            </s:dict>
        </content>
    </entry>
</feed>

I have tried various approaches but I am not able to extract out the data.

from xml.dom import minidom
pid = minidom.parseString(response.text).getElementsByTagName('pid')[0].childNodes[0].nodeValue

ThenI tried like this

import xml.etree.ElementTree as ET
root = ET.fromstring(response.text)
print(root.tag)
print(root.find('entry')) 

But not getting entry tag data also properly Can someone please help here. Note :- I cannot use xmltodict as thats not available in my enterprise packages

1 Answer 1

1

Y ou can use BeautifulSoup to pull the text node value of tag s:key along with attr name="pid" because it's super powerful to parse html and xml DOM contents.

xml_doc = '''
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xml" href="/static/atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:s="http://dev.splunk.com/ns/rest" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" shp_request_proxied_from="3DB91F64-892E-4DB2-9271-C5CB5CAFBFBB">
    <title>jobs</title>
    <updated>2022-10-05T10:48:30-07:00</updated>
    <author>
        <name>Splunk</name>
    </author>
    <opensearch:totalResults>1</opensearch:totalResults>
    <entry>
        <published>2022-10-05T00:03:34.000-07:00</published>
        <author>
            <name>abc-pull</name>
        </author>
        <content type="text/xml">
            <s:dict>
                <s:key name="pid">1664953412.79414</s:key>
            </s:dict>
        </content>
    </entry>
</feed>
'''

from bs4 import BeautifulSoup
pid = BeautifulSoup(xml_doc, 'lxml').select_one('s\:key[name="pid"]').text
print(pid)

Output:

1664953412.79414
Sign up to request clarification or add additional context in comments.

4 Comments

@Fazlul can this be done without BeautifulSoup
As it's pseudo-class ,so xml will not work here. I've tested
@pythonNinja Alternative is using xpath
ok so ElementTree cant bed used

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.