3

I have a xml file: 'product.xml' that I want to read using pandas, here is an example of the sample file:

<?xml version="1.0"?>
 <Rowset>
  <ROW>
   <Product_ID>32</Product_ID>
   <Company_ID>2</Company_ID>
   <User_ID>90</User_ID>
   <Product_Type>1</Product_Type>
   <Application_ID>BBC#:1010</Application_ID>
  </ROW>
  <ROW>
   <Product_ID>22</Product_ID>
   <Company_ID>4</Company_ID>
   <User_ID>190</User_ID>
   <Product_Type>2</Product_Type>
   <Application_ID>NBA#:1111</Application_ID>
  </ROW>
  <ROW>
   <Product_ID>63</Product_ID>
   <Company_ID>4</Company_ID>
   <User_ID>99</User_ID>
   <Product_Type>1</Product_Type>
   <Application_ID>BBC#:1212</Application_ID>
  </ROW>
  <ROW>
   <Product_ID>22</Product_ID>
   <Company_ID>2</Company_ID>
   <User_ID>65</User_ID>
   <Product_Type>2</Product_Type>
   <Application_ID>NBA#:2210</Application_ID>
  </ROW>

I am trying to generate a 2D Table using pandas like:

Application ID Product Type  Product ID
BBC#:1010            1         32
NBA#:1111            2         22
BBC#:1212            1         63  
NBA#:2210            2         22   

so far, I have tried this code:

import xml.etree.cElementTree as ET
import pandas as pd
import pandas_read_xml as pdx
df = pdx.read_xml('product.xml')
path='product.xml'
dfcols = ['Application_ID', 'Product_Type', 'Product_ID']
root = et.parse(path)
rows = root.findall('.//ROW')
#NESTED LIST
xml_data = [[row.get('Application_ID'), row.get('Product_Type'), row.get('Product_ID')] for row in rows]
df_xml = pd.DataFrame(xml_data, columns=dfcols)

print(df_xml)

How can I print that type of 2D Table?, please help, Thank you.

2
  • 1
    Welcome to Stack Overflow. Please be specific: what exactly does "this is not working" mean? Commented Oct 4, 2021 at 7:40
  • 1
    But this is not working please provide more information, what do you mean by is not working? Does code you created raise error when run? If yes what is that error? If not and you got wrong answer provide said answer and write how it should be different that currently is Commented Oct 4, 2021 at 7:40

2 Answers 2

2

Use [] to filter and reorganize columns:

cols = ['Application_ID', 'Product_Type', 'Product_ID']
df = pd.read_xml('product.xml')[cols]
print(df)

# Output:
  Application_ID  Product_Type  Product_ID
0      BBC#:1010             1          32
1      NBA#:1111             2          22
2      BBC#:1212             1          63
3      NBA#:2210             2          22

If you want to replace '_' from your column names by ' ':

df.columns = df.columns.str.replace('_', ' ')
print(df)

# Output:
  Application ID  Product Type  Product ID
0      BBC#:1010             1          32
1      NBA#:1111             2          22
2      BBC#:1212             1          63
3      NBA#:2210             2          22
Sign up to request clarification or add additional context in comments.

Comments

2

As of Pandas 1.3.0 there is a read_xml() function that makes working with reading/writing XML data in/out of pandas much easier.

Once you upgrade to Pandas >1.3.0 you can simply use:

df = pd.read_xml("___XML_FILEPATH___")
print(df)

(Note that in the XML sample above the <Rowset> tag needs to be closed)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.