2

I am having an XML data which also contains HTML data. I'm trying to dump this XML data to one cell in a csv file which also contains other columns. Right now, it is splitting itself and coming in different(adjacent) cells. Therefore reading the csv using pandas throws an error

Error tokenizing data. C error: Expected 94 fields in line 3, saw 221

I also looked into a similar scenario. But it didn't help because it was from a database. Therefore the workaround functionalities will be different.

I am not looking to parse the XML data. I just want to save the entire XML data into one cell in a csv file.

Moreover, I cannot share the data snapshot for confidentiality reasons but I hope the issue is conveyed.

Any help is appreciated.

2 Answers 2

2

you can use built in csv package, try wrapping the xml as a string inside of a list:

import csv

xml = ["""<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
</catalog>"""]

with open("test.csv", "w", encoding="utf8") as out_file:
    writer = csv.writer(out_file)
    writer.writerow(xml)

You should then be able to read it with pandas.

Sign up to request clarification or add additional context in comments.

Comments

1
import pandas as pd


with open('note.xml', 'r') as f:
    data = f.read()

df = pd.DataFrame(data = {'xml_file': [data]})

df.to_csv('xml_as_csv.csv')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.