0

I am new to Python and I have no big experience with this language. I have a CSV file from where I have to get the data into an XML structure. I want to do it with Pandas and ElementTree.

I read a tutorial to do so, but I can't understand the structure of the code.

The CSV file looks something like this

test_name,health_feat,result
test_1,20,1
test_2,23,1
test_3,24,0
test_4,12,1
test_5,45,0
test_6,34,1
test_7,78,1
test_8,23,1
test_9,12,1
test_10,12,1

The final XML file should look like this, but I am not sure how to handle attributes when applying ElementTree:

<xml version = '1.0' encoding = 'UTF-8'>
    <Test Testname = 'test_1' >
        <Health_Feat>20</health_feat>
        <Result>1</Result>
    </Test>
    <Test Testname = 'test_2'>
        <Health_Feat>23</Healt_Feat>
        <Result>1</Result>
    </Test>
    <Test Testname = 'test_3'>
        <Health_Feat>24</Healt_Feat>
        <Result>0</Result>
    </Test>
    <Test Testname = 'test_4'>
        <Health_Feat>30</Healt_Feat>
        <Result>1</Result>
    </Test>
    <Test Testname = 'test_5'>
        <Health_Feat>12</Healt_Feat>
        <Result>1</Result>
    </Test>
    <Test Testname = 'test_6'>
        <Health_Feat>45</Healt_Feat>
        <Result>1</Result>
    </Test>
    <Test Testname = 'test_7'>
        <Health_Feat>34</Healt_Feat>
        <Result>0</Result>
    </Test>
    <Test Testname = 'test_8'>
        <Health_Feat>78</Healt_Feat>
        <Result>1</Result>
    </Test>
    <Test Testname = 'test_9'>
        <Health_Feat>23</Healt_Feat>
        <Result>1</Result>
    </Test>
    <Test Testname = 'test_10'>
        <Health_Feat>12</Healt_Feat>
        <Result>1</Result>
    </Test>
</Tests>

Currently I tried something like this, but I don't know how to tell the program which line to take from the csv.

import pandas as pd
from lxml import etree as et
import uuid

df = pd.read_csv('mytests.csv', sep = ',')

root = et.Element(Tests)

for index, row in df.iterrows():
    if row['test_name'] == 'test_1':
        Test = et.SubElement(root, 'Test')
        Test.attrib['fileUID']
        health_feat = et.subElement('health_feat')
        Result = et.subElement('Result')
    else:
        Tests = et.subElement(root, 'Tests')
        
et.ElementTree(root).write('mytests.xml', pretty_print = True, xml_declaration = True, encoding = 'UTF-8', standalone = None)
3
  • 4
    Hello, Johannes. Can you show us the code of your attempt and tell us what went differently from what you expected? Commented Aug 12, 2019 at 11:42
  • possible duplicate stackoverflow.com/questions/41059264/… Commented Aug 12, 2019 at 11:47
  • @Bonifacio2 I have not written that much so far. I read a tutorial on how to do it but they have a different structur in their xml. Commented Aug 12, 2019 at 12:47

1 Answer 1

0

Something like this:

import pandas as pd
df = pd.read_csv('your_csv.csv', sep=',')


def csv_to_xml(row):
    return """<Test Testname="%s">
    <Health_Feat>%s</Health_Feat>
    <Result>%s</Result>
    </Test>""" % (row.test_name, row.health_Feat, row.Result)

and call the function for every row of your csv in a for loop

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your Help i really appreciate our efford helping me with that.
So the function you sent me would be the extractor for one row of my csv file right?
Yes. If you run that like this for row in df: and call the function you will get all rows.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.