I understand this question has been asked a few times but I've tried everything to no avail. I'm not sure if this is an edge case or I'm missing something. I'm trying to parse an xml file and return as a df. Below is my attempt:
import xml.etree.ElementTree as ET
import pandas as pd
from lxml import objectify
tree = ET.parse('file.xml')
root = tree.getroot()
<?xml version="1.0"?>
-<document page-count="1">
-<page number="1">
-<table data-table="1" data-page="1" data-filename="Schedule.pdf">
-<tr>
<td colspan="17">Wednesday 20th Mar</td>
-</tr>
-<tr>
<td colspan="3" style="text-align: right">1</td>
<td style="text-align: right">2</td>
<td style="text-align: right">3</td>
<td style="text-align: right">4</td>
<td style="text-align: right">5</td>
<td style="text-align: right">6</td>
<td style="text-align: right">7</td>
<td style="text-align: right">8</td>
<td style="text-align: right">9</td>
<td style="text-align: right">10</td>
<td style="text-align: right">11</td>
<td style="text-align: right">12</td>
<td style="text-align: right">13</td>
<td style="text-align: right">14</td>
<td style="text-align: right">15</td>
</tr>
-<tr>
<td>HOME</td>
<td>D</td>
<td/>
<td/>
<td>08:00</td>
<td>09:00</td>
<td>10:00</td>
<td>11:00</td>
<td>12:00</td>
<td>13:00</td>
<td/>
<td/>
<td/>
<td colspan="4"/>
</tr>
</table>
</page>
</document>
I can export the data as strings:
print(ET.tostring(root, encoding='utf8').decode('utf8'))
But when trying to export as a df it returns an empty frame:
xml = objectify.parse('file.xml')
root = xml.getroot()
data=[]
for i in range(len(root.getchildren())):
data.append([child.text for child in root.getchildren()[i].getchildren()])
df = pd.DataFrame(data).T
Out:
0
0 None
If the date is stripped I'm hoping to Intended Output will be:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 HOME D 08:00 09:00 10:00 11:00 12:00 13:00