I'm having some trouble with parsing an XML. After having a search on here I've got close to getting what I need but I'm having issues with unnesting some deeper data.
this is my xml data.
xml = """
<instance>
<ID>1</ID>
<start>0</start>
<end>16.56</end>
<code>8. Kego Furuhasi</code>
<label>
<group>Team</group>
<text>Celtic FC</text>
</label>
<label>
<group>Action</group>
<text>Positional attacks</text>
</label>
<label>
<group>Half</group>
<text>2nd half</text>
</label>
<pos_x>52.5</pos_x>
<pos_y>34.0</pos_y>
</instance>
<instance>
<ID>2</ID>
<start>0</start>
<end>16.56</end>
<code>8. Kego Furuhasi</code>
<label>
<group>Team</group>
<text>Celtic FC</text>
</label>
<label>
<group>Action</group>
<text>Passes accurate</text>
</label>
<label>
<group>Half</group>
<text>2nd half</text>
</label>
<pos_x>52.5</pos_x>
<pos_y>34.0</pos_y>
</instance>
<instance>
<ID>3</ID>
<start>0</start>
<end>18.8</end>
<code>42. Kollum MakGregor</code>
<label>
<group>Team</group>
<text>Celtic FC</text>
</label>
<label>
<group>Action</group>
<text>Passes accurate</text>
</label>
<label>
<group>Half</group>
<text>2nd half</text>
</label>
<pos_x>45.2</pos_x>
<pos_y>34.3</pos_y>
</instance>
"""
and my current code;
import xml.etree.ElementTree as Xet
import pandas as pd
cols = ["ID", "Start", "End", "Player", "Team", "Action","Half","x","y"]
rows = []
# Parsing the XML file
xmlparse = Xet.parse(r'/content/gdrive/MyDrive/Celtic_Dundee.xml')
root = xmlparse.getroot()
for i in root:
ID = i.find("ID").text
Start = i.find("start").text
End = i.find("end").text
Player= i.find("code").text
Team = i.find("label/0/text")
Action = i.find("label/1/text")
Half = i.find("label/2/text")
x = i.find("pos_x")
y = i.find("pos_y")
rows.append({"ID": ID,
"Start": Start,
"End": End,
"Player": Code,
"Team": Team,
"Action": Action,
"Half": Half,
"x": x,
"y": y})
df = pd.DataFrame(rows, columns=cols)
# Writing dataframe to csv
df.to_csv('output.csv')
Running that code returns me a CSV, but it has some errors in it. The Team,Action,Half
returns columns with no data in them.
I'm wanting the <text> tags from under each of the <label> to correspond with the <group>
I've tried using the i.find().text but it returns a NoneType error.
