2

I have XML file which I want to convert to CSV using Python. I need contents from the Testitemname tag as CSV headers and contents in the Testvalue tag as values in CSV. Can some one help me with this?

Sample XML file (input)

<sample:batch xmlns:sample="http://sample.com/schema/sampleimport">
    <sample:TestData>
        <sample:Testitem>
            <sample:TestitemName>Field1</sample:TestitemName>
            <sample:Testvalue>1</sample:Testvalue>
        </sample:Testitem>
        <sample:Testitem>
            <sample:TestitemName>Field2</sample:TestitemName>
            <sample:Testvalue>Hi</sample:Testvalue>
        </sample:Testitem>
        <sample:Testitem>
            <sample:TestitemName>Field3</sample:TestitemName>
            <sample:Testvalue>1234</sample:Testvalue>
        </sample:TestData>
        <sample:TestData>
        <sample:Testitem>
            <sample:TestitemName>Field1</sample:TestitemName>
            <sample:Testvalue>3</sample:Testvalue>
        </sample:Testitem>
        <sample:Testitem>
            <sample:TestitemName>Field2</sample:TestitemName>
            <sample:Testvalue>Hello</sample:Testvalue>
        </sample:Testitem>
        <sample:Testitem>
            <sample:TestitemName>Field3</sample:TestitemName>
            <sample:Testvalue>999</sample:Testvalue>
        </sample:TestData>

Desired CSV file (Output)

Field1,Field2,Filed3 (Header field names)
1,Hi,1234 (1st record)
3,Hello,999 (2nd record)
0

2 Answers 2

2

BeautifulSoup can be used to parse XML data. With well organized data, you just need to loop over the nested tag types and collect the data as you go.

Code:

from BeautifulSoup import BeautifulSoup as Soup

def parse_xml(file_like):
    data = []
    names = []
    soup = Soup(file_like)
    for batch in soup.findAll('sample:batch'):
        for test_data in batch.findAll('sample:testdata'):
            item = {}
            for test_item in test_data.findAll('sample:testitem'):
                name = test_item.find('sample:testitemname').text
                value = test_item.find('sample:testvalue').text
                item[name] = value
                if name not in names:
                    names.append(name)
            data.append(item)

    return [names] + [[datum.get(name) for name in names] for datum in data]

Test Code:

data = parse_xml(xml_data)
for datum in data:
    print(','.join(datum))

Test Data:

from io import StringIO
xml_data = StringIO(u"""
    <sample:batch xmlns:sample="http://sample.com/schema/sampleimport">
        <sample:TestData>
            <sample:Testitem>
                <sample:TestitemName>Field1</sample:TestitemName>
                <sample:Testvalue>1</sample:Testvalue>
            </sample:Testitem>
            <sample:Testitem>
                <sample:TestitemName>Field2</sample:TestitemName>
                <sample:Testvalue>Hi</sample:Testvalue>
            </sample:Testitem>
            <sample:Testitem>
                <sample:TestitemName>Field3</sample:TestitemName>
                <sample:Testvalue>1234</sample:Testvalue>
        </sample:TestData>
        <sample:TestData>
            <sample:Testitem>
                <sample:TestitemName>Field1</sample:TestitemName>
                <sample:Testvalue>3</sample:Testvalue>
            </sample:Testitem>
            <sample:Testitem>
                <sample:TestitemName>Field2</sample:TestitemName>
                <sample:Testvalue>Hello</sample:Testvalue>
            </sample:Testitem>
            <sample:Testitem>
                <sample:TestitemName>Field3</sample:TestitemName>
                <sample:Testvalue>999</sample:Testvalue>
            </sample:TestItem>
        </sample:TestData>
    </sample:batch>
""")

Results:

Field1,Field2,Field3
1,Hi,1234
3,Hello,999
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Stephen, it worked!! I would like to write output to a CSV file. Can you help me again?
The output I have shown is CSV... Just write to a file instead of printing to the screen
1

Use pyxmlparser

It is a command line utility to do the same thing!

https://pypi.org/project/pyxmlparser/

Disclaimer: I am the author of the library. Since it is new I am more than happy to know if it worked.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.