0

I am trying to iterate over all nodes & child nodes in a tree using ElementTree. I would like to get the all the parent & its child XML tags as columns and values which could append the child nodes to parent in CSV format. I am using python 2.7. The header should be printed only once & below should be respective values

XML File :

<Customers>  
<Customer CustomerID="GREAL">  
      <CompanyName>Great Lakes Food Market</CompanyName>  
      <ContactName>Howard Snyder</ContactName>  
      <ContactTitle>Marketing Manager</ContactTitle>  
      <Phone>(503) 555-7555</Phone>  
      <FullAddress>  
        <Address>2732 Baker Blvd.</Address>  
        <City>Eugene</City>  
        <Region>OR</Region>  
        <PostalCode>97403</PostalCode>  
        <Country>USA</Country>  
      </FullAddress>  
 </Customer>  
    <Customer CustomerID="HUNGC">  
      <CompanyName>Hungry Coyote Import Store</CompanyName>  
      <ContactName>Yoshi Latimer</ContactName>  
      <ContactTitle>Sales Representative</ContactTitle>  
      <Phone>(503) 555-6874</Phone>  
      <Fax>(503) 555-2376</Fax>  
      <FullAddress>  
        <Address>City Center Plaza 516 Main St.</Address>  
        <City>Elgin</City>  
        <Region>OR</Region>  
        <PostalCode>97827</PostalCode>  
        <Country>USA</Country>  
      </FullAddress>  
    </Customer>  
    <Customer CustomerID="LAZYK">  
      <CompanyName>Lazy K Kountry Store</CompanyName>  
      <ContactName>John Steel</ContactName>  
      <ContactTitle>Marketing Manager</ContactTitle>  
      <Phone>(509) 555-7969</Phone>  
      <Fax>(509) 555-6221</Fax>  
      <FullAddress>  
        <Address>12 Orchestra Terrace</Address>  
        <City>Walla Walla</City>  
        <Region>WA</Region>  
        <PostalCode>99362</PostalCode>  
        <Country>USA</Country>  
      </FullAddress>  
    </Customer>  
    <Customer CustomerID="LETSS">  
      <CompanyName>Let's Stop N Shop</CompanyName>  
      <ContactName>Jaime Yorres</ContactName>  
      <ContactTitle>Owner</ContactTitle>  
      <Phone>(415) 555-5938</Phone>  
      <FullAddress>  
        <Address>87 Polk St. Suite 5</Address>  
        <City>San Francisco</City>  
        <Region>CA</Region>  
        <PostalCode>94117</PostalCode>  
        <Country>USA</Country>  
      </FullAddress>  
    </Customer>  
  </Customers>  

My Code:

#Import Libraries
import csv
import xmlschema
import xml.etree.ElementTree as ET

#Define the variable to store the XML Document
xml_file = 'C:/Users/391648/Desktop/BOSS_20190618_20190516_18062019141928_CUMA/source_Files_XML/CustomersOrders.xml'

#using XML Schema Library validate the XML against XSD
my_schema = xmlschema.XMLSchema('C:/Users/391648/Desktop/BOSS_20190618_20190516_18062019141928_CUMA/source_Files_XML/CustomersOrders.xsd')
SchemaCheck = my_schema.is_valid(xml_file)
print(SchemaCheck) #Prints as True if the document is validated with XSD

#Parse XML & get root
tree = ET.parse(xml_file)
root = tree.getroot()

#Create & Open CSV file
xml_data_to_csv = open('C:/Users/391648/Desktop/BOSS_20190618_20190516_18062019141928_CUMA/source_Files_XML/PythonXMl.csv','w')

#create variable to write to csv
csvWriter = csv.writer(xml_data_to_csv)

#Create list contains header
count =0

#Loop for each node
for element in root.findall('Customers/Customer'):
    List_nodes = []

    #Get head by Tag
    if count ==0:
        list_header =[]
        Full_Address = []
        CompanyName = element.find('CompanyName').tag
        list_header.append(CompanyName)

        ContactName = element.find('ContactName').tag
        list_header.append(ContactName)

        ContactTitle = element.find('ContactTitle').tag
        list_header.append(ContactTitle)

        Phone = element.find('Phone').tag
        list_header.append(Phone)

        print(list_header)
        csvWriter.writerow(list_header)

        count = count + 1

    #Get the data of the Node
    CompanyName = element.find('CompanyName').text
    List_nodes.append(CompanyName)

    ContactName = element.find('ContactName').text
    List_nodes.append(ContactName)

    ContactTitle = element.find('ContactTitle').text
    List_nodes.append(ContactTitle)

    Phone = element.find('Phone').text
    List_nodes.append(Phone)

    print(List_nodes)

    #Write List_Nodes to CSV
    csvWriter.writerow(List_nodes)

xml_data_to_csv.close()
Expected CSV output:

CompanyName,ContactName,ContactTitle,Phone, Address, City, Region, PostalCode, Country
Great Lakes Food Market,Howard Snyder,Marketing Manager,(503) 555-7555, City Center Plaza 516 Main St., Elgin, OR, 97827, USA
Hungry Coyote Import Store,Yoshi Latimer,Sales Representative,(503) 555-6874, 12 Orchestra Terrace, Walla Walla, WA, 99362, USA
1
  • Should not it be root.findall('Customer')? Commented Aug 21, 2019 at 10:40

3 Answers 3

2

You might be better off using lxml. It has most of the desired functionality for finding elements built in.

from lxml import etree
import csv

with open('file.xml') as fp:
    xml = etree.fromstring(fp.read())

field_dict = {
    'CompanyName': 'CompanyName',
    'ContactName': 'ContactName',
    'ContactTitle': 'ContactTitle',
    'Phone': 'Phone',
    'Address': 'FullAddress/Address',
    'City': 'FullAddress/City',
    'Region': 'FullAddress/Region',
    'PostalCode': 'FullAddress/PostalCode',
    'Country': 'FullAddress/Country'
}

customers = []
for customer in xml:
    line = {k: customer.find(v).text for k, v in field_dict.items()}
    customers.append(line)

with open('customers.csv', 'w') as fp:
    writer = csv.DictWriter(fp, field_dict)
    writer.writerows(customers)
Sign up to request clarification or add additional context in comments.

Comments

1

You can use xmltodict to convert data to JSON format instead of parsing XML:

import xmltodict
import pandas as pd

with open('data.xml', 'r') as f:
    data = xmltodict.parse(f.read())['Customers']['Customer']

data_pd = {'CompanyName': [i['CompanyName'] for i in data],
           'ContactName': [i['ContactName'] for i in data],
           'ContactTitle': [i['ContactTitle'] for i in data],
           'Phone': [i['Phone'] for i in data],
           'Address': [i['FullAddress']['Address'] for i in data],
           'City': [i['FullAddress']['City'] for i in data],
           'Region': [i['FullAddress']['Region'] for i in data],
           'PostalCode': [i['FullAddress']['PostalCode'] for i in data],
           'Country': [i['FullAddress']['Country'] for i in data]}

df = pd.DataFrame(data_pd)
df.to_csv('result.csv', index=False)

Output CSV file:

CompanyName,ContactName,ContactTitle,Phone,Address,City,Region,PostalCode,Country
Great Lakes Food Market,Howard Snyder,Marketing Manager,(503) 555-7555,2732 Baker Blvd.,Eugene,OR,97403,USA
Hungry Coyote Import Store,Yoshi Latimer,Sales Representative,(503) 555-6874,City Center Plaza 516 Main St.,Elgin,OR,97827,USA
Lazy K Kountry Store,John Steel,Marketing Manager,(509) 555-7969,12 Orchestra Terrace,Walla Walla,WA,99362,USA
Let's Stop N Shop,Jaime Yorres,Owner,(415) 555-5938,87 Polk St. Suite 5,San Francisco,CA,94117,USA

2 Comments

Hi, I get following error while running your code snippet: Traceback (most recent call last): File"C:/Users/391648/PycharmProjects/XSD_XMl_Parse/Parse_XML_Df_Pandas.py", line 5, in <module> data = xmltodict.parse(f.read())['Customers']['Customer'] KeyError: 'Customers' Process finished with exit code 1. What could be the cause of it ?
Have you tried it with the same XML as in your question?
0

A couple of things I have changed:

  • Removed schema validation since I do not have the XSD. You may include it
  • Made the child node traversal dynamic instead of statically referring each child node
  • The main for loop condition changed to for customer in root.findall('Customer') from for customer in root.findall('Customers/Customer')

However, I tried to keep your program structure, library usage intact. Here is the modified program:

import xml.etree.ElementTree as et
import csv

tree = et.parse("../data/customers.xml")
root = tree.getroot()
headers = []
count = 0
xml_data_to_csv = open('../data/customers.csv', 'w')

csvWriter = csv.writer(xml_data_to_csv)
for customer in root.findall('Customer'):
    data = []
    for detail in customer:
        if(detail.tag == 'FullAddress'):
            for addresspart in detail:
                data.append(addresspart.text.rstrip('/n/r'))
                if(count == 0):
                    headers.append(addresspart.tag)
        else:
            data.append(detail.text.rstrip('/n/r'))
            if(count == 0):
                headers.append(detail.tag)
    if(count == 0):
        csvWriter.writerow(headers)
    csvWriter.writerow(data)
    count = count + 1

With the given input XML content it produces:

CompanyName,ContactName,ContactTitle,Phone,Address,City,Region,PostalCode,Country
Great Lakes Food Market,Howard Snyde,Marketing Manage,(503) 555-7555,2732 Baker Blvd.,Eugene,OR,97403,USA
Hungry Coyote Import Store,Yoshi Latime,Sales Representative,(503) 555-6874,(503) 555-2376,City Center Plaza 516 Main St.,Elgi,OR,97827,USA
Lazy K Kountry Store,John Steel,Marketing Manage,(509) 555-7969,(509) 555-6221,12 Orchestra Terrace,Walla Walla,WA,99362,USA
Let's Stop N Shop,Jaime Yorres,Owne,(415) 555-5938,87 Polk St. Suite 5,San Francisco,CA,94117,USA

Note: Instead of writing to CSV in the loop you may append to an array and write it at one go. It depends on your content size and performance.


Update: When you have customers and their orders in the XML

The XML processing and CSV writing code structure remains the same. Additionally, process Orders element while processing customers. Now, under Orders Order elements can be processed exactly like Customer. As you mentioned each Order has ShipInfo as well.

The input XML is assumed to be (based on the comment below):

<Customers>
    <Customer CustomerID="GREAL">
        <CompanyName>Great Lakes Food Market</CompanyName>
        <ContactName>Howard Snyder</ContactName>
        <ContactTitle>Marketing Manager</ContactTitle>
        <Phone>(503) 555-7555</Phone>
        <FullAddress>
            <Address>2732 Baker Blvd.</Address>
            <City>Eugene</City>
            <Region>OR</Region>
            <PostalCode>97403</PostalCode>
            <Country>USA</Country>
        </FullAddress>
        <Orders>
            <Order>
                <Param1>Value1</Param1>
                <Param2>Value2</Param2>
                <ShipInfo>
                    <ShipInfoParam1>Value3</ShipInfoParam1>
                    <ShipInfoParam2>Value4</ShipInfoParam2>
                </ShipInfo>
            </Order>
            <Order>
                <Param1>Value5</Param1>
                <Param2>Value6</Param2>
                <ShipInfo>
                    <ShipInfoParam1>Value7</ShipInfoParam1>
                    <ShipInfoParam2>Value8</ShipInfoParam2>
                </ShipInfo>
            </Order>
        </Orders>
    </Customer>
    <Customer CustomerID="HUNGC">
        <CompanyName>Hungry Coyote Import Store</CompanyName>
        <ContactName>Yoshi Latimer</ContactName>
        <ContactTitle>Sales Representative</ContactTitle>
        <Phone>(503) 555-6874</Phone>
        <Fax>(503) 555-2376</Fax>
        <FullAddress>
            <Address>City Center Plaza 516 Main St.</Address>
            <City>Elgin</City>
            <Region>OR</Region>
            <PostalCode>97827</PostalCode>
            <Country>USA</Country>
        </FullAddress>
        <Orders>
            <Order>
                <Param1>Value7</Param1>
                <Param2>Value8</Param2>
                <ShipInfo>
                    <ShipInfoParam1>Value9</ShipInfoParam1>
                    <ShipInfoParam2>Value10</ShipInfoParam2>
                </ShipInfo>
            </Order>
        </Orders>
    </Customer>
    <Customer CustomerID="LAZYK">
        <CompanyName>Lazy K Kountry Store</CompanyName>
        <ContactName>John Steel</ContactName>
        <ContactTitle>Marketing Manager</ContactTitle>
        <Phone>(509) 555-7969</Phone>
        <Fax>(509) 555-6221</Fax>
        <FullAddress>
            <Address>12 Orchestra Terrace</Address>
            <City>Walla Walla</City>
            <Region>WA</Region>
            <PostalCode>99362</PostalCode>
            <Country>USA</Country>
        </FullAddress>
    </Customer>
    <Customer CustomerID="LETSS">
        <CompanyName>Let's Stop N Shop</CompanyName>
        <ContactName>Jaime Yorres</ContactName>
        <ContactTitle>Owner</ContactTitle>
        <Phone>(415) 555-5938</Phone>
        <FullAddress>
            <Address>87 Polk St. Suite 5</Address>
            <City>San Francisco</City>
            <Region>CA</Region>
            <PostalCode>94117</PostalCode>
            <Country>USA</Country>
        </FullAddress>
    </Customer>
</Customers>

Here is the modified code that process both customers and orders:

import xml.etree.ElementTree as et
import csv

tree = et.parse("../data/customers-with-orders.xml")
root = tree.getroot()

customer_csv = open('../data/customers-part.csv', 'w')
order_csv = open('../data/orders-part.csv', 'w')

customerCsvWriter = csv.writer(customer_csv)
orderCsvWriter = csv.writer(order_csv)

customerHeaders = []
orderHeaders = ['CustomerID']
isFirstCustomer = True
isFirstOrder = True


def processOrders(customerId):
    global isFirstOrder
    for order in detail.findall('Order'):
        orderData = [customerId]
        for orderdetail in order:
            if(orderdetail.tag == 'ShipInfo'):
                for shipinfopart in orderdetail:
                    orderData.append(shipinfopart.text.rstrip('/n/r'))
                    if(isFirstOrder):
                        orderHeaders.append(shipinfopart.tag)
            else:
                orderData.append(orderdetail.text.rstrip('/n/r'))
                if(isFirstOrder):
                    orderHeaders.append(orderdetail.tag)
        if(isFirstOrder):
            orderCsvWriter.writerow(orderHeaders)
        orderCsvWriter.writerow(orderData)
        isFirstOrder = False


for customer in root.findall('Customer'):
    customerData = []
    customerId = customer.get('CustomerID')
    for detail in customer:
        if(detail.tag == 'FullAddress'):
            for addresspart in detail:
                customerData.append(addresspart.text.rstrip('/n/r'))
                if(isFirstCustomer):
                    customerHeaders.append(addresspart.tag)
        elif(detail.tag == 'Orders'):
            processOrders(customerId)
        else:
            customerData.append(detail.text.rstrip('/n/r'))
            if(isFirstCustomer):
                customerHeaders.append(detail.tag)
    if(isFirstCustomer):
        customerCsvWriter.writerow(customerHeaders)
    customerCsvWriter.writerow(customerData)
    isFirstCustomer = False

Output produced in customers-part.csv:

CompanyName,ContactName,ContactTitle,Phone,Address,City,Region,PostalCode,Country
Great Lakes Food Market,Howard Snyde,Marketing Manage,(503) 555-7555,2732 Baker Blvd.,Eugene,OR,97403,USA
Hungry Coyote Import Store,Yoshi Latime,Sales Representative,(503) 555-6874,(503) 555-2376,City Center Plaza 516 Main St.,Elgi,OR,97827,USA
Lazy K Kountry Store,John Steel,Marketing Manage,(509) 555-7969,(509) 555-6221,12 Orchestra Terrace,Walla Walla,WA,99362,USA
Let's Stop N Shop,Jaime Yorres,Owne,(415) 555-5938,87 Polk St. Suite 5,San Francisco,CA,94117,USA

Output produced in orders-part.csv:

CustomerID,Param1,Param2,ShipInfoParam1,ShipInfoParam2
GREAL,Value1,Value2,Value3,Value4
GREAL,Value5,Value6,Value7,Value8
HUNGC,Value7,Value8,Value9,Value10

Note: the code can be optimized further by reusing. I am leaving that part to you. Secondly, notice that in each order customer Id is added in order to distinguish.

5 Comments

Hi, Thank you for the code snippet. If there are multiple elements that have to be clubbed in a single CSV file or multiple CSV files then how should I approach? Lets say we have customers & order data set in XML. 1. customer has multiple child & well as order has multiple but can we include both the elements data in one CSV just next to each other or different CSV.
I suggest you go for two different CSV files for customer and order data. The parsing logic code structure remains the same as shown above. When you encounter the 'Order` element under a Customer, write to a different CSV. Also, take care care of Order CSV header.
In the same XML file, there is Orders element just like Customers which has sub-element. example: 1. Customers > Customer > Full Address 2. Orders > Order > ShipInfo. I tried to add the same parsing logic in the same code but it's not working. Do i need to do it differently ?
@RameezShaikh It is unclear when you say it's not working. Please see the updated answer (below part) to see how both customers and orders are processed and output is written into two separate files.
Thank you for posting the code snippet. I have modified the code as per my need & it worked. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.