1

I'm having some trouble with parsing an XML. After having a search on here I've got close to getting what I need but I'm having issues with unnesting some deeper data.

this is my xml data.

xml = """
<instance>
      <ID>1</ID>
      <start>0</start>
      <end>16.56</end>
      <code>8. Kego Furuhasi</code>
      <label>
         <group>Team</group>
         <text>Celtic FC</text>
      </label>
      <label>
         <group>Action</group>
         <text>Positional attacks</text>
      </label>
      <label>
         <group>Half</group>
         <text>2nd half</text>
      </label>
      <pos_x>52.5</pos_x>
      <pos_y>34.0</pos_y>
   </instance>
   <instance>
      <ID>2</ID>
      <start>0</start>
      <end>16.56</end>
      <code>8. Kego Furuhasi</code>
      <label>
         <group>Team</group>
         <text>Celtic FC</text>
      </label>
      <label>
         <group>Action</group>
         <text>Passes accurate</text>
      </label>
      <label>
         <group>Half</group>
         <text>2nd half</text>
      </label>
      <pos_x>52.5</pos_x>
      <pos_y>34.0</pos_y>
   </instance>
   <instance>
      <ID>3</ID>
      <start>0</start>
      <end>18.8</end>
      <code>42. Kollum MakGregor</code>
      <label>
         <group>Team</group>
         <text>Celtic FC</text>
      </label>
      <label>
         <group>Action</group>
         <text>Passes accurate</text>
      </label>
      <label>
         <group>Half</group>
         <text>2nd half</text>
      </label>
      <pos_x>45.2</pos_x>
      <pos_y>34.3</pos_y>
   </instance>
"""

and my current code;

import xml.etree.ElementTree as Xet
import pandas as pd
  
cols = ["ID", "Start", "End", "Player", "Team", "Action","Half","x","y"]
rows = []
  
# Parsing the XML file
xmlparse = Xet.parse(r'/content/gdrive/MyDrive/Celtic_Dundee.xml')
root = xmlparse.getroot()
for i in root:
    ID = i.find("ID").text
    Start = i.find("start").text
    End = i.find("end").text
    Player= i.find("code").text
    Team = i.find("label/0/text")
    Action = i.find("label/1/text")
    Half = i.find("label/2/text")
    x = i.find("pos_x")
    y = i.find("pos_y")
    
  
    rows.append({"ID": ID,
                 "Start": Start,
                 "End": End,
                 "Player": Code,
                 "Team": Team,
                 "Action": Action,
                 "Half": Half,
                 "x": x,
                 "y": y})
  
df = pd.DataFrame(rows, columns=cols)
  
# Writing dataframe to csv
df.to_csv('output.csv')

Running that code returns me a CSV, but it has some errors in it. The Team,Action,Half returns columns with no data in them.

enter image description here

I'm wanting the <text> tags from under each of the <label> to correspond with the <group> I've tried using the i.find().text but it returns a NoneType error.

1 Answer 1

2

You're almost there, just a few hiccups. Try chainging your for loop to

for i in root:
    #no change in the first 4 items:
    ID = i.find("ID").text
    Start = i.find("start").text
    End = i.find("end").text
    Player= i.find("code").text
    #changes from here:
    Team = i.findall("./label[1]/text")[0].text
    Action = i.findall("./label[2]/text")[0].text
    Half = i.findall("./label[3]/text")[0].text
    x = i.find("pos_x").text
    y = i.find("pos_y").text    
  
    rows.append({"ID": ID,
                 "Start": Start,
                 "End": End,
                 "Player": Player,
                 "Team": Team,
                 "Action": Action,
                 "Half": Half,
                 "x": x,
                 "y": y})

Given the xml in your question, I get this output:

    ID   Start  End     Player        Team              Action                 Half     x       y
0   1   0   16.56   8. Kego Furuhasi    Celtic FC   Positional attacks  2nd half    52.5    34.0
1   2   0   16.56   8. Kego Furuhasi    Celtic FC   Passes accurate     2nd half    52.5    34.0
2   3   0   18.8    42. Kollum MakGregor    Celtic FC   Passes accurate     2nd half    45.2    34.3
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you so much for your reply, Jack. Unfortunately I'm getting an IndexError: list index out of range error when running that
If you are getting this error using your actual xml, it means the sample xml in your question is not representative of the actual xml. The code in the answer definitely works with the sample in the question.
Thanks, I'll have a look through why I'm getting that error - I was using just a snippet of my code for the example.
@JayRSP Glad to help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.