XML to CSV Python

Question

The XML data(file.xml) for the state will look like below

<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<Activity_Logs xsi:schemaLocation="http://www.cisco.com/PowerKEYDVB/Auditing 
DailyActivityLog.xsd" To="2018-04-01" From="2018-04-01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.cisco.com/PowerKEYDVB/Auditing">
    <ActivityRecord>
       <time>2015-09-16T04:13:20Z</time>
       <oper>Create_Product</oper>
       <pkgEid>10</pkgEid>
       <pkgName>BBCWRL</pkgName>
       </ActivityRecord>
    <ActivityRecord>
       <time>2015-09-16T04:13:20Z</time>
       <oper>Create_Product</oper>
       <pkgEid>18</pkgEid>
       <pkgName>CNNINT</pkgName>
    </ActivityRecord>

Parsing and conversion to CSV of above mentioned XML file will be done by the following python code.

import csv
import xml.etree.cElementTree as ET


tree =  ET.parse('file.xml')
root = tree.getroot()


data_to_csv= open('output.csv','w')

list_head=[]

Csv_writer=csv.writer(data_to_csv)

count=0
for elements in root.findall('ActivityRecord'):
    List_node = []
    if count == 0 :

        time = elements.find('time').tag
        list_head.append(time)

        oper = elements.find('oper').tag
        list_head.append(oper)

        pkgEid = elements.find('pkgEid').tag
        list_head.append(pkgEid)


        pkgName = elements.find('pkgName').tag
        list_head.append(pkgName)

        Csv_writer.writerow(list_head)
        count = +1

    time = elements.find('time').text
    List_node.append(time)

    oper = elements.find('oper').text
    List_node.append(oper)

    pkgEid = elements.find('pkgEid').text
    List_node.append(pkgEid)

    pkgName = elements.find('pkgName').text
    List_node.append(pkgName)    

    Csv_writer.writerow(List_node)

data_to_csv.close()

The code I am using is not giving me any data in CSV. Could some one tell me where excatly am I going wrong?

what elements you want to extract from xml file? Please specify more details. — Rachit kapadia
– Rachit kapadia, Commented Apr 18, 2018 at 11:46
time, oper, pkgEid, pkgName are the elements I want to extract — Nipun khanna
– Nipun khanna, Commented Apr 18, 2018 at 12:12

Rachit kapadia · Accepted Answer · 2018-04-19 10:27:25Z

6

Using pandas and BeautifulSoup you can achieve your expected output easily:

#Code:

import pandas as pd
import itertools
from bs4 import BeautifulSoup as b
with open("file.xml", "r") as f: # opening xml file
    content = f.read()

soup = b(content, "lxml")
pkgeid =  [ values.text for values in soup.findAll("pkgeid")]
pkgname = [ values.text for values in soup.findAll("pkgname")]
time =  [ values.text for values in soup.findAll("time")]
oper =  [ values.text for values in soup.findAll("oper")]
# For python-3.x use `zip_longest` method
# For python-2.x use 'izip_longest method
data = [item for item in itertools.zip_longest(time, oper, pkgeid, pkgname)] 
df  = pd.DataFrame(data=data)
df.to_csv("sample.csv",index=False, header=None)

#output in `sample.csv` file will be as follows:
2015-09-16T04:13:20Z,Create_Product,10,BBCWRL
2015-09-16T04:13:20Z,Create_Product,18,CNNINT
2018-04-01T03:30:28Z,Deactivate_Dhct,,

edited Apr 19, 2018 at 10:27

answered Apr 18, 2018 at 12:40

Rachit kapadia

6997 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

15 Comments

Nipun khanna Over a year ago

I get this error AttributeError: 'NoneType' object has no attribute 'text'

Nipun khanna Over a year ago

As down the XML there are times when 'pkgeid' could be blank

Rachit kapadia Over a year ago

if it is blank then the field will be empty

Rachit kapadia Over a year ago

Also please check filename, in my code it was different filename though i changed it

Nipun khanna Over a year ago

<ActivityRecord> <time>2018-04-01T03:30:28Z</time> <oper>Deactivate_Dhct</oper> <dhct>18:55:0F:47:03:2D</dhct> </ActivityRecord>

|

Willian Vieira · Accepted Answer · 2018-04-24 18:35:36Z

6

Using Pandas, parsing all xml fields.

import xml.etree.ElementTree as ET
import pandas as pd

tree = ET.parse("file.xml")
root = tree.getroot()

get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]

df = pd.DataFrame.from_dict(l)
df.to_csv('file.csv')

answered Apr 24, 2018 at 18:35

Willian Vieira

7564 silver badges10 bronze badges

Comments

tector · Accepted Answer · 2021-07-08 09:14:24Z

3

Answer for 2021:
you can use Pandas to read XML and output CSV
https://pandas.pydata.org/pandas-docs/dev/whatsnew/v1.3.0.html#read-and-write-xml-documents

import pandas as pd
df = pd.read_xml(<xml_or_xml_filepath>)
# ...
df.to_csv(<csv_filepath>)

for more details on usage see official documentation: https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.read_xml.html

answered Jul 8, 2021 at 9:14

tector

1,0461 gold badge9 silver badges32 bronze badges

Comments

Vishnu Kiran · Accepted Answer · 2019-03-13 23:48:52Z

1

Use pyxmlparser if it is a one-time operation.

Disclaimer I am the author of the library and it is fairly new. Any feedback is appreciated. It is a command line utility.

https://pypi.org/project/pyxmlparser/

answered Mar 13, 2019 at 23:48

Vishnu Kiran

6527 silver badges15 bronze badges

1 Comment

tursunWali Over a year ago

oww, you did not provide examples of usage

milenakowalska · Accepted Answer · 2021-03-05 08:22:48Z

Found the most appropriate way of doing this:

import os
import pandas as pd
from bs4 import BeautifulSoup as b

with open("file.xml", "r") as f: # opening xml file
    content = f.read()

soup = b(content, "lxml")
df1 = pd.DataFrame()

for each_file in files_xlm: 
    with open( each_file, "r") as f: # opening xml file
        content = f.read()
    soup = b(content, "lxml")    

    list1 = []
    for values in soup.findAll("activityrecord"):  
        if values.find("time") is None:
            time = ""
        else:
            time = values.find("time").text        
        if values.find("oper") is None:
            oper = ""    
        else:
            oper = values.find("oper").text      
        if values.find("pkgeid") is None:
            pkgeid = ""    
        else:
            pkgeid = values.find("pkgeid").text     
        if values.find("pkgname") is None:
            pkgname = ""    
        else:
            pkgname = values.find("pkgname").text 
        if values.find("dhct") is None:
            dhct = ""    
        else:
            dhct = values.find("dhct").text   
        if values.find("sourceid") is None:
            sourceid = ""    
        else:
            sourceid = values.find("sourceid").text      
    
        list1.append(time+','+ oper+','+pkgeid+','+ pkgname+','+dhct+','+sourceid)
        df = pd.DataFrame(list1)



df=df[0].str.split(',', expand=True)
df.columns = ['Time','Oper','PkgEid','PkgName','dhct','sourceid']
df.to_csv("new.csv",index=False)

Collectives™ on Stack Overflow

XML to CSV Python

5 Answers 5

15 Comments

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

15 Comments

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related