2

Sorry if this question is already asked. I have below XML file for which i want to convert it into CSV or excel. Here i want to extract NodeName and its child DestIPAddress under IpRoutelist. and Value under Custom/Name tag

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
    <soap:Body>
        <EnbConfigGetResponse xmlns="http://Airspan.Netspan.WebServices">
            <EnbConfigGetResult>
                <ErrorCode>OK</ErrorCode>
                <NodeResult>
                    <NodeResultCode>OK</NodeResultCode>
                    <NodeName>IMUMB0899</NodeName>
                    <NodeDetail>
                        <Custom>
                            <Name>Circle</Name>
                            <Value>MU</Value>
                        </Custom>
                        <Custom>
                            <Name>GW VLAN 601</Name>
                            <Value>2405:200:101::</Value>
                        </Custom>
                        <Custom>
                            <Name>GW VLAN 602</Name>
                            <Value>2405:200:104::</Value>
                        </Custom>
                    </NodeDetail>
                    <EnbConfig>
                        <Name>IMUMB0899</Name>
                        <Hardware>1000 SFP</Hardware>
                        <Description>TT</Description>
                        <Site>DND</Site>
                        <Region>DND</Region>
                        <Altitude>0</Altitude>
                        <NbifEventAlarmForwarding>Enabled</NbifEventAlarmForwarding>
                        <ENodeBType>Macro</ENodeBType>
                        <ENodeBID>397063</ENodeBID>
                        <M1SubnetMask>120</M1SubnetMask>
                        <IpRouteList>
                            <IpRoute>
                                <DestIpAddress>172.172.6.20</DestIpAddress>
                                <IpSubnetMask>255.255.255.255</IpSubnetMask>
                                <GatewayIpAddress>172.21.200.1</GatewayIpAddress>
                            </IpRoute>
                            <IpRoute>
                                <DestIpAddress>2405:20:1::</DestIpAddress>
                                <IpSubnetMask>40</IpSubnetMask>
                                <GatewayIpAddress>2405:20:101:4:7:2:61:1</GatewayIpAddress>
                            </IpRoute>
                        </IpRouteList>
                <NodeResult>
     </EnbConfigGetResult>
        </EnbConfigGetResponse>
    </soap:Body>
</soap:Envelope>

I tried below code which extract the Name and IProute, but when i try to merge tham only one IP route i get against NodeName but two are available.

from bs4 import BeautifulSoup
import pandas as pd
import lxml
import xml.etree.cElementTree
import openpyxl
import inspect
import os

sites = "xml"

with open(sites, "r",encoding='unicode_escape') as f:
    xml_data = f.read()

soup = BeautifulSoup(xml_data, "xml")

tag1 = input("Enter tagname1:")
tag2 = input("Enter tagname2:")

data = []
dd = []

for td in soup.find_all(tag1):
    data.append({"NodeName": td.text})
    
for iproute in soup.find_all(tag2):
    dd.append({"IpRoute": iproute.text})

df1 = pd.DataFrame(data)
df2 = pd.DataFrame(dd)
    
df = pd.merge(df1,df2,left_index=True, right_index=True)
    
df.to_excel(sites + '.xlsx', sheet_name='Detail', index = False)
print("*************Done*************")

Expected Output: enter image description here

4
  • What do you enter for tag1 and tag2? Also, why are you storing your results in two separate data structures (data and dd), then merging them later? Don't you want to keep everything together? Commented Sep 18, 2020 at 18:49
  • Your xml is invalid; can you edit your question and correct it? Commented Sep 18, 2020 at 18:50
  • @MattDMo i need the output as shown in snapshot. And i enter tag1=NodeName and tag2 = DestIpAddress Commented Sep 18, 2020 at 18:54
  • @JackFleeting Its a part of big xml file, i edited thi. Commented Sep 18, 2020 at 19:00

3 Answers 3

4

Another method.

from simplified_scrapy import SimplifiedDoc, utils, req
# xml = utils.getFileContent('file.xml')
xml = ''' Your xml string'''

doc = SimplifiedDoc(xml)
lstNodeResult = doc.selects('NodeResult')
data = [['NodeName','DestIpAddress','GatewayIpAddress','value1','value2','value3']]
for result in lstNodeResult:
    lstCustom = result.selects('NodeDetail>Custom')
    if lstCustom:
        lstCustom = lstCustom.Value.text
    NodeName = result.NodeName.text
    lstIpRoute = result.IpRoutes
    for IpRoute in lstIpRoute:
        row = [NodeName,IpRoute.DestIpAddress.text,IpRoute.GatewayIpAddress.text]
        if lstCustom: row.extend(lstCustom)
        data.append(row)
# print (data)
utils.save2csv('test.csv',data)

# Or
data = {
    'NodeName':lstNodeResult.NodeName.text,
    'DestIpAddress':lstNodeResult.select('IpRoute>DestIpAddress>text()'),
    'GatewayIpAddress':lstNodeResult.select('IpRoute>GatewayIpAddress>text()')
}
# print (data)

Result:

.......
I-MU-NVMB-OSC-0900-SMC004,2405:200:310:5a::,2405:200:101:500:7:2:602:3503
I-MU-NVMB-OSC-0900-SMC004,2405:200:310:1::,2405:200:101:500:7:2:602:3503
I-MU-NVMB-ISC-0181-SWC0002,2405:200:310:1::,2405:200:101:500:7:2:602:5d03,MU,2405:200:101:500:7:2:601:5d03,2405:200:101:500:7:2:602:5d03
I-MU-NVMB-ISC-0181-SWC0002,2405:200:311:2::,2405:200:101:500:7:2:602:5d03,MU,2405:200:101:500:7:2:601:5d03,2405:200:101:500:7:2:602:5d03
I-MU-NVMB-ISC-0181-SWC0002,2405:200:310:a152::,2405:200:101:500:7:2:602:5d03,MU,2405:200:101:500:7:2:601:5d03,2405:200:101:500:7:2:602:5d03
.......

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

Delete blank lines.

def delEmptyRows(name, encoding="utf-8"):
    lines = utils.getFileLines(name, encoding=encoding)
    lines = [line for line in lines if line.strip()!='']
    utils.saveFile(name, "".join(lines), encoding=encoding)
# use
delEmptyRows('test.csv')

Method of obtaining IBridge2RfStatsRow data.

from simplified_scrapy import SimplifiedDoc, utils, req
xml = req.get('https://pastebin.com/raw/SWhDM1zq')

rows = []
doc = SimplifiedDoc(xml)
lstIBridge2RfStatsRow = doc.selects('IBridge2RfStatsRow').children
# Add header
header = []
for col in lstIBridge2RfStatsRow[0]:
    header.append(col.tag)
rows.append(header)

# Generate line
for row in lstIBridge2RfStatsRow:
    rows.append([col.text for col in row])

utils.save2csv('IBridge2RfStatsRow.csv',rows)
Sign up to request clarification or add additional context in comments.

16 Comments

Hey @dabingsou Thanks for your response, but there are multiple destIP against each NodeName., how to make that work?
@SunilSharma The number of these destIP is not fixed. What do you want to do with it?
@SunilSharma This problem does exist in windows. I haven't found a good way. Temporarily added a method to delete empty lines.
@SunilSharma I have updated the answer and you can see if it is what you want.
@SunilSharma name = doc.select('StatisticsResult>Name>text()')
|
1

There are certain things you have missed to consider in your code - like you are just doing a soup.find_all('NodeName') - which gives all nodenames present in the xml and soup.find_all('DestIPAddress') gives all destipaddresses present in the xml - how will you map them if you do it that way ?

Try this code, I have made outer loop to go through each noderesult - and inner loop to go through each destipaddress , i am assuming that nodename will be single for each noderesult hence using that same name for every destipaddress found in a particular noderesult :

tag1 = "NodeName"
tag2 = "DestIpAddress"

tag1_list = []
tag2_list = []

for node_res in soup.find_all('NodeResult'):
    
  for tg2 in node_res.find_all(tag2):
      tag1_list.append(soup.find(tag1).text)
      tag2_list.append(tg2.text)

print(tag1_list)
print(tag2_list)

df = pd.DataFrame({
  'NodeNames': tag1_list,
  'IpRoute': tag2_list
})
    
print(df)

DF-output :

   NodeNames       IpRoute
0  IMUMB0899  172.172.6.20
1  IMUMB0899   2405:20:1::

You can add the write-excel line in the end .

7 Comments

Thanks it worked well, i missed one thing i also need GatewayIpAddress against each nodename. And Nodename is single for each Noderesult
hope you got an idea how to add new items from xml to df
Thanks, i got the idea i worked on pandas, but xml is bit new for me
The NodeName in output is coming same for all xml, its taking first NodeName for full xml.
there's a mistake in my answer - tag1_list.append(soup.find(tag1).text) should be changed to tag1_list.append(node_res.find(tag1).text) - then you'll get the proper node-name .
|
1

Since you are dealing with xml and importing lxml, you might as well use it with xpath.

Note that the xml in your question is invalid (even after the editing) and this answer assumes the example was properly edited to be valid.

from lxml import etree
import pandas as pd

soap = """[your xml above, fixed]"""
doc = etree.XML(soap.encode())
columns = ['NodeName','DesIPAdd','GIPAdd','Value1','Value2','Value3']
ns = {'xx' : "http://Airspan.Netspan.WebServices"}
nodes = doc.xpath('//xx:NodeResult', namespaces=ns)
rows = []
for node in nodes:        
    sub_nodes = node.xpath('//xx:EnbConfig/xx:IpRouteList/xx:IpRoute', namespaces=ns)    
    for sub_node in sub_nodes:
        row = []
        row.append(node.xpath('//xx:NodeName/text()', namespaces=ns)[0])        
        row.append(sub_node.xpath('.//xx:DestIpAddress/text()', namespaces=ns)[0])
        row.append(sub_node.xpath('.//xx:GatewayIpAddress/text()', namespaces=ns)[0])
        row.extend(node.xpath('//xx:NodeDetail/xx:Custom/xx:Value/text()', namespaces=ns))
        rows.append(row)
pd.DataFrame(rows,columns=columns)

Output:

      NodeName    DesIPAdd      GIPAdd              Value1  Value2      Value3
0   IMUMB0899   172.172.6.20    172.21.200.1           MU   2405:200:101::  2405:200:104::
1   IMUMB0899   2405:20:1::     2405:20:101:4:7:2:61:1  MU  2405:200:101::  2405:200:104::

3 Comments

I tried this with my xml file, instead of 997 rows, i get 60894 rows, so something happen in the loop,and NodeName is same for all rows. i have loaded part of xml on below 0bin.net/paste/HOf-mVMg#bLRgpGfbezBNX8wrO-MW/…
@SunilSharma I took a quick look at the xml; if I were you, I would give up on the idea of converting it to pandas (or whatever). I would stick with the native xml environment, learn xpath and xquery, use tools like BaseX and go from there.
Yeah but this the requirement, thats why need to convert it into csv.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.