Parsing XML and converting to CSV python

Question

I'm having some trouble with parsing an XML. After having a search on here I've got close to getting what I need but I'm having issues with unnesting some deeper data.

this is my xml data.

xml = """
<instance>
      <ID>1</ID>
      <start>0</start>
      <end>16.56</end>
      <code>8. Kego Furuhasi</code>
      <label>
         <group>Team</group>
         <text>Celtic FC</text>
      </label>
      <label>
         <group>Action</group>
         <text>Positional attacks</text>
      </label>
      <label>
         <group>Half</group>
         <text>2nd half</text>
      </label>
      <pos_x>52.5</pos_x>
      <pos_y>34.0</pos_y>
   </instance>
   <instance>
      <ID>2</ID>
      <start>0</start>
      <end>16.56</end>
      <code>8. Kego Furuhasi</code>
      <label>
         <group>Team</group>
         <text>Celtic FC</text>
      </label>
      <label>
         <group>Action</group>
         <text>Passes accurate</text>
      </label>
      <label>
         <group>Half</group>
         <text>2nd half</text>
      </label>
      <pos_x>52.5</pos_x>
      <pos_y>34.0</pos_y>
   </instance>
   <instance>
      <ID>3</ID>
      <start>0</start>
      <end>18.8</end>
      <code>42. Kollum MakGregor</code>
      <label>
         <group>Team</group>
         <text>Celtic FC</text>
      </label>
      <label>
         <group>Action</group>
         <text>Passes accurate</text>
      </label>
      <label>
         <group>Half</group>
         <text>2nd half</text>
      </label>
      <pos_x>45.2</pos_x>
      <pos_y>34.3</pos_y>
   </instance>
"""

and my current code;

import xml.etree.ElementTree as Xet
import pandas as pd
  
cols = ["ID", "Start", "End", "Player", "Team", "Action","Half","x","y"]
rows = []
  
# Parsing the XML file
xmlparse = Xet.parse(r'/content/gdrive/MyDrive/Celtic_Dundee.xml')
root = xmlparse.getroot()
for i in root:
    ID = i.find("ID").text
    Start = i.find("start").text
    End = i.find("end").text
    Player= i.find("code").text
    Team = i.find("label/0/text")
    Action = i.find("label/1/text")
    Half = i.find("label/2/text")
    x = i.find("pos_x")
    y = i.find("pos_y")
    
  
    rows.append({"ID": ID,
                 "Start": Start,
                 "End": End,
                 "Player": Code,
                 "Team": Team,
                 "Action": Action,
                 "Half": Half,
                 "x": x,
                 "y": y})
  
df = pd.DataFrame(rows, columns=cols)
  
# Writing dataframe to csv
df.to_csv('output.csv')

Running that code returns me a CSV, but it has some errors in it. The Team,Action,Half returns columns with no data in them.

I'm wanting the <text> tags from under each of the <label> to correspond with the <group> I've tried using the i.find().text but it returns a NoneType error.

Jack Fleeting · Accepted Answer · 2021-08-30 20:32:42Z

2

You're almost there, just a few hiccups. Try chainging your for loop to

for i in root:
    #no change in the first 4 items:
    ID = i.find("ID").text
    Start = i.find("start").text
    End = i.find("end").text
    Player= i.find("code").text
    #changes from here:
    Team = i.findall("./label[1]/text")[0].text
    Action = i.findall("./label[2]/text")[0].text
    Half = i.findall("./label[3]/text")[0].text
    x = i.find("pos_x").text
    y = i.find("pos_y").text    
  
    rows.append({"ID": ID,
                 "Start": Start,
                 "End": End,
                 "Player": Player,
                 "Team": Team,
                 "Action": Action,
                 "Half": Half,
                 "x": x,
                 "y": y})

Given the xml in your question, I get this output:

    ID   Start  End     Player        Team              Action                 Half     x       y
0   1   0   16.56   8. Kego Furuhasi    Celtic FC   Positional attacks  2nd half    52.5    34.0
1   2   0   16.56   8. Kego Furuhasi    Celtic FC   Passes accurate     2nd half    52.5    34.0
2   3   0   18.8    42. Kollum MakGregor    Celtic FC   Passes accurate     2nd half    45.2    34.3

answered Aug 30, 2021 at 20:32

Jack Fleeting

25k6 gold badges27 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

JayRSP Over a year ago

Thank you so much for your reply, Jack. Unfortunately I'm getting an IndexError: list index out of range error when running that

Jack Fleeting Over a year ago

If you are getting this error using your actual xml, it means the sample xml in your question is not representative of the actual xml. The code in the answer definitely works with the sample in the question.

JayRSP Over a year ago

Thanks, I'll have a look through why I'm getting that error - I was using just a snippet of my code for the example.

Jack Fleeting Over a year ago

@JayRSP Glad to help.

Collectives™ on Stack Overflow

Parsing XML and converting to CSV python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related