1

Hi I am trying to convert my xml data to pandas data frame but unable to parse all data. its a 13mb xml file.

I want to extract text inside "NodeName", i tried various other ways of Element Tree but failed. Below is my XML looks like:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
    <soap:Body>
        <Ib440ConfigGetResponse xmlns="http://Airspan.Netspan.WebServices">
            <Ib440ConfigGetResult>
                <ErrorCode>OK</ErrorCode>
                <NodeResult>
                    <NodeResultCode>OK</NodeResultCode>
                    <NodeName>INAPKVLIVGLRTW6001ENBIB5004</NodeName>
                    <Ib440Config>
                        <Name>INAPKVLIVGLRTW6001ENBIB5004</Name>
                        <Hardware>iBridge 440-221</Hardware>
                        <Description>I-AP-KVLI-ENB-6001</Description>
                        <ManagedMode>Managed</ManagedMode>
                        <Site>Kavali</Site>
                        <Region>Andhra Pradesh</Region>
                        <NbifEventAlarmForwarding>Enabled</NbifEventAlarmForwarding>
                        <ConfigMode>OptimizedModeC</ConfigMode>
                        <MediumAccessMethod>CSMA</MediumAccessMethod>
                        <WirelessProtocol>802.11n</WirelessProtocol>
                        <HtSupportedMcs>MCS0-15</HtSupportedMcs>
                        <VhtSupportedMcs>MCS0-7</VhtSupportedMcs>
                        <CellRadiusRange>Short</CellRadiusRange>
                        <GuardInterval>Long</GuardInterval>
                        <Frequency>5850</Frequency>

Below is a small code i try bt it shows only 4 line.

import pandas_read_xml as pdx
import pandas as pd
df = pdx.read_xml('1111s.xml')
df

result i get it from enter image description here

2 Answers 2

2

I'd give BeautifulSoup a try.

You could read xml file as a bs4 object and then use bs4 methods to get the attributes you need (and convert the result into a dataframe).

from bs4 import BeautifulSoup


with open("1111s.xml", "r") as f:
    xml_data = f.read()

soup = BeautifulSoup(xml_data, "xml")
soup.find("NodeName").get_text(strip=True)
# 'INAPKVLIVGLRTW6001ENBIB5004'


# in a loop
data = []
for element in soup.find("NodeName").find_next_siblings():
    data.append({
        "Name": element.find("Name").get_text(strip=True),
        "Hardware": element.find("Hardware").get_text(strip=True),
        "Site": element.find("Site").get_text(strip=True)
    })

pd.DataFrame(data)
    Name                        Hardware        Site
0   INAPKVLIVGLRTW6001ENBIB5004 iBridge 440-221 Kavali

Sign up to request clarification or add additional context in comments.

2 Comments

just an example to get you started
Thanks Mate for your kind support :)
0

Looking at the XML, you would need to make a list of tags you want to navigate to get to the "root" tag. If I have to guess, then:

import pandas_read_xml as pdx

root_tag_list = ['soap:Envelope', 'soap:Body', 'Ib440ConfigGetResponse', 'Ib440ConfigGetResult', 'NodeResult', 'Ib440Config']

df = pdx.read_xml('1111s.xml', root_tag_list)

df

could work.

1 Comment

i get this error, TypeError: list indices must be integers or slices, not str

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.