How to read XML file into Pandas Dataframe

Question

I have a xml file: 'product.xml' that I want to read using pandas, here is an example of the sample file:

<?xml version="1.0"?>
 <Rowset>
  <ROW>
   <Product_ID>32</Product_ID>
   <Company_ID>2</Company_ID>
   <User_ID>90</User_ID>
   <Product_Type>1</Product_Type>
   <Application_ID>BBC#:1010</Application_ID>
  </ROW>
  <ROW>
   <Product_ID>22</Product_ID>
   <Company_ID>4</Company_ID>
   <User_ID>190</User_ID>
   <Product_Type>2</Product_Type>
   <Application_ID>NBA#:1111</Application_ID>
  </ROW>
  <ROW>
   <Product_ID>63</Product_ID>
   <Company_ID>4</Company_ID>
   <User_ID>99</User_ID>
   <Product_Type>1</Product_Type>
   <Application_ID>BBC#:1212</Application_ID>
  </ROW>
  <ROW>
   <Product_ID>22</Product_ID>
   <Company_ID>2</Company_ID>
   <User_ID>65</User_ID>
   <Product_Type>2</Product_Type>
   <Application_ID>NBA#:2210</Application_ID>
  </ROW>

I am trying to generate a 2D Table using pandas like:

Application ID Product Type  Product ID
BBC#:1010            1         32
NBA#:1111            2         22
BBC#:1212            1         63  
NBA#:2210            2         22

so far, I have tried this code:

import xml.etree.cElementTree as ET
import pandas as pd
import pandas_read_xml as pdx
df = pdx.read_xml('product.xml')
path='product.xml'
dfcols = ['Application_ID', 'Product_Type', 'Product_ID']
root = et.parse(path)
rows = root.findall('.//ROW')
#NESTED LIST
xml_data = [[row.get('Application_ID'), row.get('Product_Type'), row.get('Product_ID')] for row in rows]
df_xml = pd.DataFrame(xml_data, columns=dfcols)

print(df_xml)

How can I print that type of 2D Table?, please help, Thank you.

Welcome to Stack Overflow. Please be specific: what exactly does "this is not working" mean? — mzjn
– mzjn, Commented Oct 4, 2021 at 7:40
But this is not working please provide more information, what do you mean by is not working? Does code you created raise error when run? If yes what is that error? If not and you got wrong answer provide said answer and write how it should be different that currently is — Daweo
– Daweo, Commented Oct 4, 2021 at 7:40

Corralien · Accepted Answer · 2021-10-04 07:45:20Z

2

Use [] to filter and reorganize columns:

cols = ['Application_ID', 'Product_Type', 'Product_ID']
df = pd.read_xml('product.xml')[cols]
print(df)

# Output:
  Application_ID  Product_Type  Product_ID
0      BBC#:1010             1          32
1      NBA#:1111             2          22
2      BBC#:1212             1          63
3      NBA#:2210             2          22

If you want to replace '_' from your column names by ' ':

df.columns = df.columns.str.replace('_', ' ')
print(df)

# Output:
  Application ID  Product Type  Product ID
0      BBC#:1010             1          32
1      NBA#:1111             2          22
2      BBC#:1212             1          63
3      NBA#:2210             2          22

answered Oct 4, 2021 at 7:45

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

p_sutherland · Accepted Answer · 2021-10-25 21:02:26Z

2

As of Pandas 1.3.0 there is a read_xml() function that makes working with reading/writing XML data in/out of pandas much easier.

Once you upgrade to Pandas >1.3.0 you can simply use:

df = pd.read_xml("___XML_FILEPATH___")
print(df)

(Note that in the XML sample above the <Rowset> tag needs to be closed)

answered Oct 25, 2021 at 21:02

p_sutherland

4911 gold badge11 silver badges23 bronze badges

Collectives™ on Stack Overflow

How to read XML file into Pandas Dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related