I recently asked a question that got closed so I am trying to make it less broad. My issue is I don't know where to begin with the problem so I can't really show what I have 'already tried'. Unable to find anything online that has helped.
I have an open source XML file that follows this format:
<surnames>
<cluster>
<surname lang="ga" text="Achaorainn" anchor="Achaorainn"/>
<surname lang="en" text="Ahern" anchor="Ahern"/>
<surname lang="en" text="Aherne" anchor="Aherne"/>
<surname lang="en" text="Ahearne" anchor="Ahearne"/>
</cluster>
<cluster>
<surname lang="en" text="Achison" anchor="Achison"/>
<surname lang="en" text="Atchison" anchor="Atchison"/>
</cluster>
<cluster>
<surname lang="en" text="Adams" anchor="Adams"/>
<surname lang="ga" text="Mac Conamha" anchor="Conamha"/>
</cluster>
<cluster>
<surname lang="ga" text="Ághas" anchor="Ághas"/>
<surname lang="en" text="Ashe" anchor="Ashe"/>
<surname lang="ga" text="Ás" anchor="Ás"/>
</cluster>
<cluster>
<surname lang="en" text="Young" anchor="Young"/>
<surname lang="ga" text="Ó Hógáin" anchor="Hógáin"/>
<surname lang="ga" text="de Siún" anchor="Siún"/>
</cluster>
</surnames>
Essentially I want this to be converted to a CSV file that looks like this, splitting each cluster into a row:
Achaorainn,Ahern,Aherne,Ahearne
Achison,Atchison
Adams,Mac Conamha
I have never tried anything like this so even just pointing me in the right direction would be a massive help.
I thought about converting to dataframe and then to CSV.
I tried this as a starting point but I can't even get it to work as I think it fails at the objectify.parse stage:
import csv
import pandas as pd
import xml.etree.ElementTree as ET
#%%
xml = objectify.parse('surnames_reduced.xml')
root = xml.getroot()
data=[]
for i in range(len(root.getchildren())):
data.append([child.text for child in root.getchildren()[i].getchildren()])
df = pd.DataFrame(data).T