0

I have the following code, which takes information from an XML file and saves some data in a csv file.

import xml.etree.ElementTree as ET
import csv

tree = ET.parse('file.xml')
root = tree.getroot()

title = []
category = []
url = []
prod = []

def find_title():
    for t in root.findall('solution/head'):
        title.append(t.find('title').text)

    for c in root.findall('solution/body'):
        category.append(c.find('category').text)

    for u in root.findall('solution/body'):
        url.append(u.find('video').text)

    for p in root.findall('solution/body'):
        prod.append(p.find('product').text)

find_title()

headers = ['Title', 'Category', 'Video URL','Product']

def save_csv():
    with open('titles.csv', 'w') as f:
        f_csv = csv.writer(f, lineterminator='\r')
        f_csv.writerow(headers)
        f.write(''.join('{},{},{},{}\n'.format(title, category, url, prod) for title, category, url, prod in zip(title, category, url, prod)))

save_csv()

I have found an issue with the text that contains ',' because it separates the output save in the list e.g:

<title>Add, Change, or Remove Transitions between Slides</title>

is getting save in the list as [Add, Change, or Remove Transitions between Slides] which make sense since this is a csv file, however, I would like to keep the whole output together.

So I there any way to remove the ',' from the title tag or can I add more code to override the ','

Thanks in advance

3
  • if you want to keep comma, you'll get in trouble with CSV; but if it's possible to replace commas with something else (like character -), do it with string replacement. Commented Jan 23, 2018 at 13:36
  • 1
    @ZeinabAbbasimazar No, CSV files can handle commas when quoted properly. The cvs module does that by default. [Edit:] And is being used properly (no direct writing to the file). Commented Jan 23, 2018 at 13:38
  • 2
    Why are you using csv.writerow for the headers but not for the data rows? If you use it for the data rows it will handle the quoting/special character issues for you. Commented Jan 23, 2018 at 13:40

1 Answer 1

2

It's not clear why you're writing the row data with a file.write() call rather than using the csv writer's writerow method (which you are using for the header row. Using that method will take care of quoting / special character issues wrt. data containing quotes and commas.

Change:

f.write(''.join('{},{},{},{}\n'.format(title, category, url, prod) for title, category, url, prod in zip(title, category, url, prod)))

to:

for row in zip(title, category, url, prod):
    f_csv.writerow(row)

and your CSV should work as expected, assuming your CSV reader handles the quoted fields.

Sign up to request clarification or add additional context in comments.

4 Comments

this is the first time I work with XML and CSV files, thanks for the heads up. I did change the line and work perfectly.
Cool. Incidentlaly, in your original code, you have lineterminator='\r' when you create the CSV writer, but then your manual line-writes werre adding '\n'. It's not clear which you actually want but with this code you'll get '\r' line endings.
Which is fairly unusual (en.wikipedia.org/wiki/Newline). If in doubt I'd omit the lineterminator param entirely and let it use the default CSV dialect
great, understood and thanks for information, really appreciate your feedback

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.