I want to create a specifically formatted XML file from a given pandas data-frame. My data-frame looks something like this -
Doc_ID Doc_Name Doc_Category
abc123 aaa111 c1
abc456 aaa222 c2
And I want to format such a dataset having 10k rows into a single XML file having the format -
<DOC>
<DOCNO> abc123 </DOCNO>
<TEXT> aaa111 + c1 </TEXT> ### Combines strings from 2 columns
</DOC>
<DOC>
<DOCNO> abc456 </DOCNO>
<TEXT> aaa222 + c2 </TEXT> ### Combines strings from 2 columns
</DOC>
I was trying to use something similar to this, but I unable to combine them all into a single XML file.
for i,row in testdoc.iterrows():
xml =['<DOC>']
xml.append('<{0}>{1}</{0}>'.format("DocNO", row["Doc_ID"]))
xml.append('<{0}>{1}</{0}>'.format("Text", row["Doc_Name"]+row['Doc_Category']))
xml.append('</DOC>')
How can I go about doing this? It would be nice to have an invalid character handler too.
Thanks!