I have this XML and i want to parse into panda's data frame:
<DISTRITO xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<NOME_DISTRITO>BRAGANCA</NOME_DISTRITO>
<CPE>PT0002000022161425NP</CPE>
<CPE>PT0002000022161458JH</CPE>
<CPE>PT0002000022161471ZP</CPE>
<CPE>PT0002000022161505SL</CPE>
</DISTRITO>
and this is my Python code:
from lxml import objectify
from lxml import etree
import pandas as pd
path = '/TestFile.xml'
xml = objectify.parse(open(path))
root = xml.getroot()
data = []
for i in root:
el_data = {}
for child in root.getchildren():
el_data[child.tag] = child.pyval
# print el_data
data.append(el_data)
df = pd.DataFrame(data)
The problem is that when i get the result it only returns the last node "" value:
CPE NOME_DISTRITO
0 PT0002000022161505SL BRAGANCA
1 PT0002000022161505SL BRAGANCA
2 PT0002000022161505SL BRAGANCA
3 PT0002000022161505SL BRAGANCA
4 PT0002000022161505SL BRAGANCA
I've digged a little into my XML file and i found that it happens when i get the same names for the nodes. For example if my file was this:
<DISTRITO xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<NOME_DISTRITO>BRAGANCA</NOME_DISTRITO>
<CPE1>PT0002000022161425NP</CPE1>
<CPE2>PT0002000022161458JH</CPE2>
<CPE3>PT0002000022161471ZP</CPE3>
<CPE4>PT0002000022161505SL</CPE4>
</DISTRITO>
there wouldn't be any problem. I have been searching a lot but i can't find a solution. So if you can help me and try to find another way to parse that file because i can't get it to work right.
Thank you guys!