13

I am looking for a way to automate the conversion of CSV to XML.

Here is an example of a CSV file, containing a list of movies:

Movies Csv

Here is the file in XML format:

<collection shelf="New Arrivals">
<movietitle="Enemy Behind">
   <type>War, Thriller</type>
   <format>DVD</format>
   <year>2003</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
<movietitle="Transformers">
   <type>Anime, Science Fiction</type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A schientific fiction</description>
</movie>
<movietitle="Trigun">
   <type>Anime, Action</type>
   <format>DVD</format>
   <episodes>4</episodes>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Vash the Stampede!</description>
</movie>
<movietitle="Ishtar">
   <type>Comedy</type>
   <format>VHS</format>
   <rating>PG</rating>
   <stars>2</stars>
   <description>Viewable boredom</description>
</movie>
</collection>

I've tried a few examples where I am able to read the csv and XML format using Python using DOM and SAX but yet am to find a simple example of the conversion. So far I have:

import csv              
f = open('movies2.csv')
csv_f = csv.reader(f)   

def convert_row(row):
   return """<movietitle="%s">
   <type>%s</type>
   <format>%s</format>
   <year>%s</year>
   <rating>%s</rating>
   <stars>%s</stars>
   <description>%s</description>
</movie>""" % (
   row.Title, row.Type, row.Format, row.Year, row.Rating, row.Stars, row.Description)

print ('\n'.join(csv_f.apply(convert_row, axis=1)))

But I get the error:

 File "moviesxml.py", line 16, in module
   print ('\n'.join(csv_f.apply(convert_row, axis=1)))
AttributeError: '_csv.reader' object has no attribute 'apply'

I am pretty new to Python, so any help would be much appreciated!

I am using Python 3.5.2.

Thanks!

Lisa

3
  • 1
    Please edit your question and include the Python code you have tried. Does it work? Commented Dec 9, 2016 at 11:23
  • hi, for some reason my code for DOM and SAX corrupted and I forgot where I found the sample code. I've included the import csv method Commented Dec 9, 2016 at 11:34
  • Future readers, avoid most answers on this post. See What''s so bad about building XML with string concatenation?. XML is not a text file to be built with string formatting. Use appropriate DOM libraries like Python's etree or lxml. Commented May 27, 2023 at 17:48

5 Answers 5

23

A possible solution is to first load the csv into Pandas and then convert it row by row into XML, as so:

import pandas as pd
df = pd.read_csv('untitled.txt', sep='|')

With the sample data (assuming separator and so on) loaded as:

          Title                   Type Format  Year Rating  Stars  \
0  Enemy Behind           War,Thriller    DVD  2003     PG     10   
1  Transformers  Anime,Science Fiction    DVD  1989      R      9   

             Description  
0          Talk about...  
1  A Schientific fiction  

And then converting to xml with a custom function:

def convert_row(row):
    return """<movietitle="%s">
    <type>%s</type>
    <format>%s</format>
    <year>%s</year>
    <rating>%s</rating>
    <stars>%s</stars>
    <description>%s</description>
</movie>""" % (
    row.Title, row.Type, row.Format, row.Year, row.Rating, row.Stars, row.Description)

print '\n'.join(df.apply(convert_row, axis=1))

This way you get a string containing the xml:

<movietitle="Enemy Behind">
    <type>War,Thriller</type>
    <format>DVD</format>
    <year>2003</year>
    <rating>PG</rating>
    <stars>10</stars>
    <description>Talk about...</description>
</movie>
<movietitle="Transformers">
    <type>Anime,Science Fiction</type>
    <format>DVD</format>
    <year>1989</year>
    <rating>R</rating>
    <stars>9</stars>
    <description>A Schientific fiction</description>
</movie>

that you can dump in to a file or whatever.

Inspired by this great answer.


Edit: Using the loading method you posted (or a version that actually loads the data to a variable):

import csv              
f = open('movies2.csv')
csv_f = csv.reader(f)   
data = []

for row in csv_f: 
   data.append(row)
f.close()

print data[1:]

We get:

[['Enemy Behind', 'War', 'Thriller', 'DVD', '2003', 'PG', '10', 'Talk about...'], ['Transformers', 'Anime', 'Science Fiction', 'DVD', '1989', 'R', '9', 'A Schientific fiction']]

And we can convert to XML with minor modifications:

def convert_row(row):
    return """<movietitle="%s">
    <type>%s</type>
    <format>%s</format>
    <year>%s</year>
    <rating>%s</rating>
    <stars>%s</stars>
    <description>%s</description>
</movie>""" % (row[0], row[1], row[2], row[3], row[4], row[5], row[6])

print '\n'.join([convert_row(row) for row in data[1:]])

Getting identical results:

<movietitle="Enemy Behind">
    <type>War</type>
    <format>Thriller</format>
    <year>DVD</year>
    <rating>2003</rating>
    <stars>PG</stars>
    <description>10</description>
</movie>
<movietitle="Transformers">
    <type>Anime</type>
    <format>Science Fiction</format>
    <year>DVD</year>
    <rating>1989</rating>
    <stars>R</stars>
    <description>9</description>
</movie>
Sign up to request clarification or add additional context in comments.

11 Comments

Why Pandas? The Python Standard Library contains a CSV module.
@LutzHorn No particular reason. OP didn't specify what method was used to load the data at first, and Pandas is a great library that I'm used to.
hi thanks for your post. I tried to install panda using 'sudo pip install pandas' but I get the message 'ImportError: no module named 'pandas''
Also (sorry i'm a real newbie) how to I save this into as an XML file - movies.xml
@LMarfell My bad, it should be something like with with open('out.xml', 'w') as f: f.write('\n'.join([convert_row(row) for row in data]))
|
3

I tried to generalize robertoia's function convert_row for any header instead of writing it by hand.

import csv  
import pandas as pd
            
f = open('movies2.csv')
csv_f = csv.reader(f)   
data = []

for row in csv_f: 
   data.append(row)
f.close()

df = pd.read_csv('movies2.csv')
header= list(df.columns)

def convert_row(row):
     str_row = """<%s>%s</%s> \n"""*(len(header)-1)
     str_row = """<%s>%s""" +"\n"+ str_row + """</%s>"""
     var_values = [list_of_elments[k] for k in range(1,len(header)) for list_of_elments in [header,row,header]]
     var_values = [header[0],row[0]]+var_values+[header[0]]
     var_values =tuple(var_values)
     return str_row % var_values

text ="""<collection shelf="New Arrivals">"""+"\n"+'\n'.join([convert_row(row) for row in data[1:]])+"\n" +"</collection >"
print(text)
with open('output.xml', 'w') as myfile: 
  myfile.write(text)

Of course with pandas now, it is simpler to just use to_xml() :

df= pd.read_csv('movies2.csv')
with open('outputf.xml', 'w') as myfile: 
  myfile.write(df.to_xml())

Comments

1

I found an easier way to insert variables into a string or block of text:

'''Twas brillig and the slithy {what}
Did gyre and gimble in the {where}
All {how} were the borogoves
And the {who} outgrabe.'''.format(what='toves',
                                  where='wabe',
                                  how='mimsy',
                                  who='momeraths')

Alternatively:

'''Twas brillig and the slithy {0}
Did gyre and gimble in the {1}
All {2} were the borogoves
And the {3} outgrabe.'''.format('toves',
                                'wabe',
                                'mimsy',
                                'momeraths')

(substitute name of incoming data variable for 'toves', 'wabe', 'mimsy', and 'momeraths')

Comments

0

Can be done without any libraries too simple script as below assumes the CSV file as cvs_to_xml_data.csv with single column value.

"""This script is to create xmlfile.txt file from a given csv_to_xml_data.csv file"""
row_list = []
with open('csv_to_xml_data.csv', 'r') as csvfile:
    row_list = csvfile.read().splitlines()

with open('xmlfile.txt', 'w') as txtfile:
    for row in row_list:
        txtfile.write(f"""<Student>
    <studentclass>8thstd</studentclass>
    <studentsubject>math</<studentsubject>
    <stuItem>{row}<stuItem>
    <stuItemSting>I am {row} and I like to be {row} because I am {row} 
</stuItemSting>
</Student>""")    

Comments

0

I would modify, simplify a bit xml ouput structure (if possible), and then you will be able to use this simple script. Just instead of movietitle="Piti-piti pa" you would have:

<movie>
<title>Piti-piti Pa</title>
...
</movie>

Variations of the prompt:

echo -e 'id,name,surname\n1,"Jane","Flame"\n2,"John","Ash"' | ./csv2xml.py > outputfile.xml
echo -e 'id,name,surname\n1,"Jane","Flame"\n2,"John","Ash"' > inputfile.csv
./csv2xml.py < inputfile.csv > outputfile.xml 

Example of outputfile.xml:

<document>
 <record>
  <id>1</id>
  <name>Jane</name>
  <surname>Flame</surname>
 </record>
 <record>
  <id>2</id>
  <name>John</name>
  <surname>Ash</surname>
 </record>
</document>

The program itself:

#!/usr/bin/python3
import csv
import fileinput

csv.register_dialect('custom',
                     delimiter=',',
                     doublequote=True,
                     escapechar=None,
                     quotechar='"',
                     quoting=csv.QUOTE_MINIMAL,
                     skipinitialspace=False)
first_line=True
data = csv.reader(fileinput.input(), dialect='custom')
print("<document>")
for record in data:
    if first_line==True:
        first_line=False
        name_of_fields=record
    else:
        print(" <record>")
        for i, field in enumerate(record):
            print("  <" + name_of_fields[i] + ">" + field + "</" + name_of_fields[i] + ">" )
        print(" </record>")
print("</document>")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.