2

I need to simplify data in an XML to be able to read it as a single table, thus a csv. I found some Python 2.7 examples with ElementTree, but so far I could not tailor it to work further down the tree, thus not just collecting the highest-level elements. But repeat the highest level element for each of their rows and get the rest.

I know I could and should RTFM, but I would need to solve the problem ASAP sadly.

Maybe the xsd file linked could help?

My data looks like

<!-- MoneyMate (tm) XMLPerfs Application version 1.0.1.1 - Copyright © 2000 MoneyMate Limited. All Rights Reserved. MoneyMate ® -->
<!-- Discrete Perfs for 180 periods for Monthly frequency -->
<MONEYMATE_XML_FEED xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://mmia2.moneymate.com/xml/MoneyMateComplete.xsd" version="1.0" calcCurrency="SEK">
<TYPES>
<TYPE typeCountry="SE" typeId="85" typeName="string" calcToDate="2013-07-16">
<COMPANIES>
<COMPANY companyId="25000068" companyName="string"/>
…

<CATEGORIES>
<CATEGORY categoryId="1101" categoryName="Aktie -- Asien">
<FUNDS>
<FUND fundId="6201" fundName="string" fundCurrency="GBP" fundCompanyId="25000068"><PERFORMANCES><MONTHLYPERFS><PERFORMANCEMONTH perfEndMonth="2006-05-31" perfMonth="-0.087670"/><PERFORMANCEMONTH>
…
</PERFORMANCES></FUND></FUNDS>
</CATEGORY>
<CATEGORY categoryId="13" categoryName="Räntefonder">
<FUNDS></FUNDS>
</CATEGORY>
</CATEGORIES>
</TYPE>
</TYPES>
</MONEYMATE_XML_FEED>

So I hope to see a table with data from FUNDS only, but:

fundid   fundName   fundCurrency   fundCompanyId   perfEndMonth   perfMonth
…        …          …              …               …              …

etc.

And in a csv file, I just did not want to break the formatting.

And please note perfMonth is key, the code just did not wrap in the box above with the data example.

0

1 Answer 1

1

I used lxml.

import csv

import lxml.etree

x = u'''<!-- MoneyMate (tm) XMLPerfs Application version 1.0.1.1 - Copyright 2000 MoneyMate Limited. All Rights Reserved. MoneyMate -->
<!-- Discrete Perfs for 180 periods for Monthly frequency -->
<MONEYMATE_XML_FEED xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://mmia2.moneymate.com/xml/MoneyMateComplete.xsd" version="1.0" calcCurrency="SEK">
    <TYPES>
        <TYPE typeCountry="SE" typeId="85" typeName="string" calcToDate="2013-07-16">
            <COMPANIES>
                <COMPANY companyId="25000068" companyName="string"/>
                <CATEGORIES>
                    <CATEGORY categoryId="1101" categoryName="Aktie -- Asien">
                        <FUNDS>
                            <FUND fundId="6201" fundName="string" fundCurrency="GBP" fundCompanyId="25000068">
                                <PERFORMANCES>
                                    <MONTHLYPERFS>
                                        <PERFORMANCEMONTH perfEndMonth="2006-05-31" perfMonth="-0.087670"/>
                                    </MONTHLYPERFS>
                                </PERFORMANCES>
                            </FUND>
                        </FUNDS>
                    </CATEGORY>
                    <CATEGORY categoryId="13" categoryName="Rntefonder">
                        <FUNDS></FUNDS>
                    </CATEGORY>
                </CATEGORIES>
            </COMPANIES>
        </TYPE>
    </TYPES>
</MONEYMATE_XML_FEED>
'''

with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(('fundid', 'fundName', 'fundCurrency', 'fundCompanyId', 'perfEndMonth', 'perfMonth'))
    root = lxml.etree.fromstring(x)
    for fund in root.iter('FUND'):
        perf = fund.find('.//PERFORMANCEMONTH')
        row = fund.get('fundId'), fund.get('fundName'), fund.get('fundCurrency'), fund.get('fundCompanyId'), perf.get('perfEndMonth'), perf.get('perfMonth')
        writer.writerow(row)

NOTE

Given xml in the question has a mismatched tag. You may need to fix that first.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, @falsetru. Sadly I cannot have lxml where I need to get this done, but perhaps the general idea still applies.
@László, You can also use xml.etree.ElementTree because I didn't use lxml specific function here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.