2

I am trying to flatten the following XML data into CSV type table data.

I could get the data in the Sal element and its attributes but I couldn't flatten SalC data to the parent sailing attributes to generate a flat table data.

I want to flatten below XML data as so that I can write to database for some further processing.

col1, col2, col3, col4, col5, col6, col6, col7, col8, col9, col10

XML Data:

<Sal col1="a1" col2="C" col3="12/5/2012" col4="a" col5="8" col6="True">
    <SalC col7="A" col8="1" col9="2" col10="True"/>
    <SalC col7="A1" col8="1" col9="2" col10="False"/>
    <SalC col7="B" col8="1" col9="2" col10="False"/>
    <SalC col7="C" col8="1" col9="2" col10="False"/>
    <SalC col7="D" col8="1" col9="2" col10="False"/>
    <SalC col7="E" col8="1" col9="2" col10="False"/>
    <SalC col7="E1" col8="1" col9="2" col10="False"/>
    <SalC col7="F" col8="1" col9="2" col10="False"/>
</Sal>
<Sal col1="a1" col2="C" col3="12/9/2012" col4="b" col5="8" col6="True">
    <SalC col7="A" col8="1" col9="2" col10="False"/>
    <SalC col7="B" col8="1" col9="2" col10="False"/>
    <SalC col7="C" col8="1" col9="2" col10="True"/>
    <SalC col7="D" col8="1" col9="2" col10="False"/>
    <SalC col7="E" col8="1" col9="2" col10="False"/>
</Sal>
<Sal col1="a2" col2="C" col3="12/8/2012" col4="c" col5="15" col6="True">
    <SalC col7="A" col8="1" col9="2" col10="True"/>
    <SalC col7="A1" col8="1" col9="2" col10="False"/>
    <SalC col7="B" col8="1" col9="2" col10="False"/>
    <SalC col7="C" col8="1" col9="2" col10="True"/>
    <SalC col7="D" col8="1" col9="2" col10="False"/>
    <SalC col7="E" col8="1" col9="2" col10="False"/>
    <SalC col7="E1" col8="1" col9="2" col10="True"/>
    <SalC col7="F" col8="1" col9="2" col10="False"/>
</Sal>
<Sal col1="a3" col2="C" col3="12/9/2012" col4="d" col5="8" col6="True">
    <SalC col7="A" col8="1" col9="2" col10="False"/>
    <SalC col7="B" col8="1" col9="2" col10="False"/>
    <SalC col7="C" col8="1" col9="2" col10="False"/>
    <SalC col7="D" col8="1" col9="2" col10="True"/>
    <SalC col7="E" col8="1" col9="2" col10="False"/>
</Sal>

Thank you for your help.

1
  • did you tried beautifullsoup?? Commented Jan 15, 2013 at 8:56

2 Answers 2

2

This can easily be solved using XSLT without introducing Python in your workflow, however, if you have to use Python, lxml.etree conveniently introduced a new class lxml.etree.XSLT which you can exploit to your advantage.

Assuming your XML data is in a file named xmlfile.xml the code below should work.

xsltfile.xsl

<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output method="text" />
        <xsl:template match="SalC">
                <xsl:value-of select="concat(../@col1,',', ../@col2,',',../@col3,',',../@col4,',',../@col5,',',../@col6,',',@col7,',',@col8,',',@col9,',',@col10)" />
        </xsl:template>
</xsl:stylesheet>

Example Code

from lxml import etree

xsltfile = etree.XSLT(etree.parse('xsltfile.xsl'))
xmlfile = etree.parse('xmlfile.xml')
output = xsltfile(xmlfile)
print(output)
Sign up to request clarification or add additional context in comments.

Comments

0

sal.attrib is dict-like:

row = dict(sal.attrib)

salc.attrib is also dict-like. To "flatten" -- or rather, join -- the two dicts togther, you could use dict.update:

row.update(salc.attrib)

Assuming each SalC element has col7, col8, cal9 and col10 attributes, you can just call row.update(salc.attrib) for each salc in sal:


import lxml.etree as ET
import csv

text = '''\
<root>
<Sal col1="a1" col2="C" col3="12/5/2012" col4="a" col5="8" col6="True">
    <SalC col7="A" col8="1" col9="2" col10="True"/>
...
    <SalC col7="D" col8="1" col9="2" col10="True"/>
    <SalC col7="E" col8="1" col9="2" col10="False"/>
</Sal>
</root>'''

fieldnames = ('col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col6', 'col7', 'col8', 
              'col9', 'col10')

with open('/tmp/output.csv', 'wb') as f:
    writer = csv.DictWriter(f, fieldnames, delimiter = ',', lineterminator = '\n', )
    writer.writeheader()
    root = ET.fromstring(text)
    for sal in root.xpath('//Sal'):
        row = dict(sal.attrib)
        for salc in sal:
            row.update(salc.attrib)
            writer.writerow(row)

yields

col1,col2,col3,col4,col5,col6,col6,col7,col8,col9,col10
a1,C,12/5/2012,a,8,True,True,A,1,2,True
a1,C,12/5/2012,a,8,True,True,A1,1,2,False
a1,C,12/5/2012,a,8,True,True,B,1,2,False
...
a3,C,12/9/2012,d,8,True,True,B,1,2,False
a3,C,12/9/2012,d,8,True,True,C,1,2,False
a3,C,12/9/2012,d,8,True,True,D,1,2,True
a3,C,12/9/2012,d,8,True,True,E,1,2,False

1 Comment

Thank you very much for explaining the concept and also the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.