-1

I have a csv that I need to convert to XML using Python. I'm a novice python dev.

Example CSV data:

Amount,Code
CODE50,1246
CODE50,6290
CODE25,1077
CODE25,9790
CODE100,5319
CODE100,4988

Necessary output XML

<coupon-codes coupon-id="CODE50">
    <code>1246</code>
    <code>1246</code>
<coupon-codes/>
<coupon-codes coupon-id="CODE25">
    <code>1077</code>
    <code>9790</code>
<coupon-codes/>
<coupon-codes coupon-id="CODE100">
    <code>5319</code>
    <code>4988</code>
<coupon-codes/>

My guess is I have to use pandas to pull the csv in, use pandas.groupby to group the Amount column, then push this into an element/subelement to create the xml, then print/push to a xml file. I can't get the groupby to work and don't know how to then push that into the element, then populate the sub element.

4
  • 1) ElementTree is giving me trouble - please explain 2) What is the logic of the csv to xml conversion - please explain. Commented Nov 15, 2021 at 18:59
  • @balderman unsure I understand the question, I mean its giving me trouble in that I can't get it to work, using the scraps of different code i've pulled off the web. I've had success with direct csv to xml wherein each row is an element, but I can't figure out how to group by a column, and have that spit out the appropriate xml. Commented Nov 15, 2021 at 19:07
  • share the code so we can see the trouble Commented Nov 15, 2021 at 19:08
  • - no need for any external lib. ElementTree is good enough here. See my answer. Commented Nov 15, 2021 at 19:22

4 Answers 4

1

why not use df.to_xml()? This simple api will generate xml from the dataframe

Sign up to request clarification or add additional context in comments.

Comments

0

ElementTree can do it without a trouble. See below

from collections import defaultdict
from xml.etree.ElementTree import Element, SubElement,ElementTree

data = defaultdict(list)

with open('in.txt') as f:
  next(f)
  for line in f:
    parts = line.split(',')
    data[parts[0]].append(parts[1].strip())

root = Element('root')
for k,v in data.items():
  sub = SubElement(root,'coupon-codes',attrib={'coupon-id':k})
  for vv in v:
    sub_v = SubElement(sub,'code')
    sub_v.text = vv

tree = ElementTree(root)
with open('out.xml', 'w') as f:
  tree.write(f, encoding='unicode')

3 Comments

this output is perfect! How do I then write that to a file? I'm trying just root.write('test.xml') but am getting an attributeError 'xml.etree.ElementTree.Element' object has no attribute 'write'
Code was updated - the xml document is written to a file.
you're the best. Thanks!
0
import xml.etree.ElementTree as ET
import pandas as pd

df = pd.read_csv('path/to/csvfile.csv')

root = ET.Element('root')
for coupon_id, df_group in df.groupby('Amount'):
    coupon_codes = ET.Element('coupon-codes', {'coupon-id': coupon_id})
    for coupon_code in df_group['Code']:
        code = ET.Element('code')
        code.text = str(coupon_code)
        coupon_codes.append(code)
    root.append(coupon_codes)

To convert it to a minified string, use this:

min_xml = ET.tostring(root, encoding='utf8')
print(min_xml)

Output:

<root><coupon-codes coupon-id="CODE100"><code>5319</code><code>4988</code></coupon-codes><coupon-codes coupon-id="CODE25"><code>1077</code><code>9790</code></coupon-codes><coupon-codes coupon-id="CODE50"><code>1246</code><code>6290</code></coupon-codes></root>

To convert it to a formatted string, use this:

ET.indent(root)
pretty_xml = ET.tostring(root, encoding='utf8')
print(pretty_xml)

Output:

<root>
  <coupon-codes coupon-id="CODE100">
    <code>5319</code>
    <code>4988</code>
  </coupon-codes>
  <coupon-codes coupon-id="CODE25">
    <code>1077</code>
    <code>9790</code>
  </coupon-codes>
  <coupon-codes coupon-id="CODE50">
    <code>1246</code>
    <code>6290</code>
  </coupon-codes>
</root>

Comments

0

I can't comment yet, but is better if you put the code that you used for trying to solve the problem.

But the way there is a very similar question here, simple-csv-to-xml-conversion-python

And here is a post for grouping by, group by/cumcount

With this information i achieved it, by doing this

df = pd.DataFrame(['CODE50',
    'CODE50',
    'CODE25',
    'CODE25',
    'CODE100',
    'CODE100'],columns=['code'])

df['amount'] = [1246,
    6290,
    1077,
    9790,
    5319,
    4988]

df['columns'] = df.groupby('code')['amount'].cumcount()
result = df.pivot(index='code', columns='columns')

def convert(df):
    str = ''
    for a in df.index:
        str += '\n'f"""<coupon-codes coupon-id="{a}">"""
        for b in df['amount'].loc[a]:
            str += '\n''       'f"<code>{b}</code>"
    return str
xml = convert(result)
print(xml)

result

<coupon-codes coupon-id="CODE100">
       <code>5319</code>
       <code>4988</code>
<coupon-codes coupon-id="CODE25">
       <code>1077</code>
       <code>9790</code>
<coupon-codes coupon-id="CODE50">
       <code>1246</code>
       <code>6290</code>

1 Comment

@user17242583 The output is the same, as the only difference is that is not sorted. I will check your answer as i haven't used xml.etree and change your output because the question has changed

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.