1

I have an xml file which have a special structure , I need to convert it to csv file using a script python This is a part of my xml File :

<?xml version="1.0" encoding="UTF-8"?>
<results version="2">
    <cppcheck version="2.9"/>
    <errors>
        <error identifier="redundantAssignment" errorStyle="style" msg="Variable &apos;ret&apos; is reassigned a value before the old one has been used.">
            <location file="D:\test\main.c" line="64" column="8" info="ret is overwritten"/>
            <location file="D:\test\main.c" line="62" column="8" info="ret is assigned"/>
            <symbol>ret</symbol>
        </error>
        <error identifier="redundantAssignment" errorStyle="style" msg="Variable &apos;ret&apos; is reassigned a value before the old one has been used.">
            <location file="D:\test\data.c" line="93" column="8" info="ret is overwritten"/>
            <location file="D:\test\data.c" line="91" column="8" info="ret is assigned"/>
            <symbol>ret</symbol>
        </error>
    </errors>
</results>

I'm using this script but it doesn't work for me :

import xml.etree.ElementTree as ET
import csv

# PARSE XML
xml = ET.parse("./error.xml")
root = xml.getElementsByTagName()

# CREATE CSV FILE
csvfile = open("data.csv",'w',encoding='utf-8')
csvfile_writer = csv.writer(csvfile)

# ADD THE HEADER TO CSV FILE
csvfile_writer.writerow(["identifier","file","errorStyle","msg"])

# FOR EACH EMPLOYEE
for error in root.findall("errors/error"):
    
    if(error):
       # EXTRACT EMPLOYEE DETAILS  
      identifier = error.get('identifier')
      file = error.find('file')
      errorStyle = error.find("errorStyle")
      msg = error.find("msg")
      csv_line = [identifier, file.text, errorStyle.text, msg.text]
      
      # ADD A NEW ROW TO CSV FILE
      csvfile_writer.writerow(csv_line)
csvfile.close()

2 Answers 2

0

Please refer to code below:

import xml.etree.ElementTree as ET
import csv

xml_data = """<?xml version="1.0" encoding="UTF-8"?>
<results version="2">
    <cppcheck version="2.9"/>
    <errors>
        <error identifier="redundantAssignment" errorStyle="style" msg="Variable &apos;ret&apos; is reassigned a value before the old one has been used.">
            <location file="Din.c" line="64" column="8" info="ret is overwritten"/>
            <location file="D.c" line="62" column="8" info="ret is assigned"/>
            <symbol>ret</symbol>
        </error>
        <error identifier="redundantAssignment" errorStyle="style" msg="Variable &apos;ret&apos; is reassigned a value before the old one has been used.">
            <location file="Dta.c" line="93" column="8" info="ret is overwritten"/>
            <location file="Dta.c" line="91" column="8" info="ret is assigned"/>
            <symbol>ret</symbol>
        </error>
    </errors>
</results>"""

root = ET.fromstring(xml_data)

csvfile = open("data.csv",'w')
csvfile_writer = csv.writer(csvfile)
csvfile_writer.writerow(["msg","identifier","errorStyle"])

for child in root:
    for item in child:
        csv_line = [item.attrib["msg"],item.attrib["identifier"] , item.attrib["errorStyle"]]
        csvfile_writer.writerow(csv_line)
        print item.attrib
csvfile.close()

Hope this helps, Thanks.

Sign up to request clarification or add additional context in comments.

3 Comments

This solution works fine , many thanks But I should have the xml format in a separate file to be parsed, when I replace the hardcoded xml by the parce it doesn't work.
Hi monaco, as per the code posted by me use the below line to parse your file (with xml content in it) into a string. xml_data = open('file.xml', 'r').read() 'file.xml' is my xml file, please accept the answer post once you get your desired result.
Yes this works fine , Thanks for your help. How can I get the value of the file attribute , I mean the second submodule ?
0

Note: not an answer to the original question but a valuable example depending on the xml structure: I had trouble both with the csv and pandas module to force csv output as text fields. (either no quotes or tripple quotes) When you have a fairly simple xml where you just want to convert either attributes or subelements to a csv I came up with a solution just with simple file IO, formatter and a generator expression:

import os
import xml.etree.ElementTree as ET

def items2csv(root, csv_path):
    attributes = ['name', 'gps', 'country', 'year', 'notes']
    csvfile = open(csv_path, 'w')
    
    # Column headers
    line = '"{}", "{}", "{}", "{}", "{}"'.format(*attributes)
    csvfile.write(line)

    for parent in root:
        values = [(parent.get(attrib) if parent.get(attrib) != None else '') for attrib in attributes]
        line = '\n"{}", "{}", "{}", "{}", "{}"'.format(*values)
        csvfile.write(line)
    csvfile.close()

xml_data = """<?xml version="1.0" encoding="UTF-8"?>
<tree>
    <item name="Name1" gps="24.227191 35.573413" country="Egypt" year="2004"></item>
    <item name="Name2" gps="24.228596 35.573733" country="Egypt" year="2004"></item>
    <item name="Name3" gps="24.253222 35.539939" country="Egypt" year="2004"></item>
    <item name="Name4" gps="25.429583 34.694408" country="Egypt" year="2007" notes="https://www.blabla.com "></item>
    <item name="Name5" gps="25.309756 34.860375" country="Egypt" year="2007"></item>
</tree>"""   

#root = ET.parse('test.xml').getroot() # from file
root = ET.fromstring(xml_data) # from variable

items2csv(root, os.path.dirname(__file__) + "/test_output.csv")

wait = input("Press Enter to Exit.")

and an example for a simple xml structure based on subelements:

#!/usr/bin/python
import os
import xml.etree.ElementTree as ET

def items2csv(root, csv_path):
    tags = ['name', 'gps', 'country', 'year', 'notes']
    csvfile = open(csv_path, 'w')
    
    # Column headers
    line = '"{}", "{}", "{}", "{}", "{}"'.format(*tags)
    csvfile.write(line)

    for parent in root:
        values = [(parent.findtext(tag) if parent.findtext(tag) != None else '') for tag in tags]
        line = '\n"{}", "{}", "{}", "{}", "{}"'.format(*values)
        csvfile.write(line)
    csvfile.close()

xml_data = """<?xml version="1.0" encoding="UTF-8"?>
<tree>
    <item>
        <name>Name1</name>
        <gps>24.227191 35.573413</gps>
        <country>Egypt</country>
        <year>2004</year>
    </item>
    <item>
        <name>Name2</name>
        <gps>24.228596 35.573733</gps>
        <country>Egypt</country>
        <year>2004</year>
    </item>
    <item>
        <name>Name3</name>
        <gps>24.253222 35.539939</gps>
        <country>Egypt</country>
        <year>2004</year>
    </item>
    <item>
        <name>Name4</name>
        <gps>25.429583 34.694408</gps>
        <country>Egypt</country>
        <year>2007</year>
        <notes>https://www.blabla.com</notes>
    </item>
    <item>
        <name>Name5</name>
        <gps>25.309756 34.860375</gps>
        <country>Egypt</country>
        <year>2007</year>
    </item>
</tree>"""   

#root = ET.parse('test.xml').getroot() # from file
root = ET.fromstring(xml_data) # from variable

items2csv(root, os.path.dirname(__file__) + "/test_output.csv")

wait = input("Press Enter to Exit.")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.