XML to CSV using python pandas

Question

Hi I am trying to convert my xml data to pandas data frame but unable to parse all data. its a 13mb xml file.

I want to extract text inside "NodeName", i tried various other ways of Element Tree but failed. Below is my XML looks like:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
    <soap:Body>
        <Ib440ConfigGetResponse xmlns="http://Airspan.Netspan.WebServices">
            <Ib440ConfigGetResult>
                <ErrorCode>OK</ErrorCode>
                <NodeResult>
                    <NodeResultCode>OK</NodeResultCode>
                    <NodeName>INAPKVLIVGLRTW6001ENBIB5004</NodeName>
                    <Ib440Config>
                        <Name>INAPKVLIVGLRTW6001ENBIB5004</Name>
                        <Hardware>iBridge 440-221</Hardware>
                        <Description>I-AP-KVLI-ENB-6001</Description>
                        <ManagedMode>Managed</ManagedMode>
                        <Site>Kavali</Site>
                        <Region>Andhra Pradesh</Region>
                        <NbifEventAlarmForwarding>Enabled</NbifEventAlarmForwarding>
                        <ConfigMode>OptimizedModeC</ConfigMode>
                        <MediumAccessMethod>CSMA</MediumAccessMethod>
                        <WirelessProtocol>802.11n</WirelessProtocol>
                        <HtSupportedMcs>MCS0-15</HtSupportedMcs>
                        <VhtSupportedMcs>MCS0-7</VhtSupportedMcs>
                        <CellRadiusRange>Short</CellRadiusRange>
                        <GuardInterval>Long</GuardInterval>
                        <Frequency>5850</Frequency>

Below is a small code i try bt it shows only 4 line.

import pandas_read_xml as pdx
import pandas as pd
df = pdx.read_xml('1111s.xml')
df

result i get it from

Hryhorii Pavlenko · Accepted Answer · 2020-08-14 20:24:48Z

2

I'd give BeautifulSoup a try.

You could read xml file as a bs4 object and then use bs4 methods to get the attributes you need (and convert the result into a dataframe).

from bs4 import BeautifulSoup


with open("1111s.xml", "r") as f:
    xml_data = f.read()

soup = BeautifulSoup(xml_data, "xml")
soup.find("NodeName").get_text(strip=True)
# 'INAPKVLIVGLRTW6001ENBIB5004'


# in a loop
data = []
for element in soup.find("NodeName").find_next_siblings():
    data.append({
        "Name": element.find("Name").get_text(strip=True),
        "Hardware": element.find("Hardware").get_text(strip=True),
        "Site": element.find("Site").get_text(strip=True)
    })

pd.DataFrame(data)
    Name                        Hardware        Site
0   INAPKVLIVGLRTW6001ENBIB5004 iBridge 440-221 Kavali

answered Aug 14, 2020 at 20:24

Hryhorii Pavlenko

3,9104 gold badges21 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Hryhorii Pavlenko Over a year ago

just an example to get you started

Sonu Over a year ago

Thanks Mate for your kind support :)

min · Accepted Answer · 2020-08-31 02:29:00Z

0

Looking at the XML, you would need to make a list of tags you want to navigate to get to the "root" tag. If I have to guess, then:

import pandas_read_xml as pdx

root_tag_list = ['soap:Envelope', 'soap:Body', 'Ib440ConfigGetResponse', 'Ib440ConfigGetResult', 'NodeResult', 'Ib440Config']

df = pdx.read_xml('1111s.xml', root_tag_list)

df

could work.

answered Aug 31, 2020 at 2:29

min

2412 silver badges7 bronze badges

1 Comment

Sonu Over a year ago

i get this error, TypeError: list indices must be integers or slices, not str

Collectives™ on Stack Overflow

XML to CSV using python pandas

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related