1

XML file

<?xml version="1.0"?>
 <productListing title="Python Products">
  <product id="1">
   <name>Python Hoodie</name>
   <description>This is a Hoodie</description>
   <cost>$49.99</cost>
   <shipping>$2.00</shipping>
  </product>
  <product id="2">
   <name>Python shirt</name>
   <description>This is a shirt</description>
   <cost>$79.99</cost>
   <shipping>$4.00</shipping>
  </product> 
  <product id="3">
   <name>Python cap</name>
   <description>This is a cap</description>
   <cost>$99.99</cost>
   <shipping>$3.00</shipping>
  </product> 
</productListing>

import xml.etree.ElementTree as et
import pandas as pd
import numpy as np

import all the libraries

tree = et.parse("documents/pythonstore.xml")

I put this file under documents

root = tree.getroot()
for a in range(3):
  for b in range(4):
     new=root[a][b].text
     print (new)

print out all the children in the XML.

df=pd.DataFrame(columns=['name','description','cost','shipping'])

created a dataframe to store all the children in XML

My questions:

  • How can I turn the new variable into a list? I tried append or list function, failed.
  • How do I use for loop to cast the children into the data frame?

Could somebody please help me! Thank you so much!

1 Answer 1

1

This might help.

# -*- coding: utf-8 -*-
s = """<?xml version="1.0"?>
 <productListing title="Python Products">
  <product id="1">
   <name>Python Hoodie</name>
   <description>This is a Hoodie</description>
   <cost>$49.99</cost>
   <shipping>$2.00</shipping>
  </product>
  <product id="2">
   <name>Python shirt</name>
   <description>This is a shirt</description>
   <cost>$79.99</cost>
   <shipping>$4.00</shipping>
  </product> 
  <product id="3">
   <name>Python cap</name>
   <description>This is a cap</description>
   <cost>$99.99</cost>
   <shipping>$3.00</shipping>
  </product> 
</productListing>"""

import xml.etree.ElementTree as et
tree = et.fromstring(s)
root = tree
res = []
for a in range(3):
    r = []
    for b in range(4):
        new=root[a][b].text
        r.append(new)
    res.append(r)

print res
df=pd.DataFrame(res, columns=['name','description','cost','shipping'])
print df

Output:

[['Python Hoodie', 'This is a Hoodie', '$49.99', '$2.00'], ['Python shirt', 'This is a shirt', '$79.99', '$4.00'], ['Python cap', 'This is a cap', '$99.99', '$3.00']]

            name       description    cost shipping
0  Python Hoodie  This is a Hoodie  $49.99    $2.00
1   Python shirt   This is a shirt  $79.99    $4.00
2     Python cap     This is a cap  $99.99    $3.00
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you tons! This problem has been bugging me hours.
this is a better way, sorry about the range function. res=[] for child in root: r=[] for element in child: new=element.text r.append(new) res.append(r) print (res) df=pd.DataFrame(res, columns=['name','description','cost','shipping']) print (df)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.