There are multiple XML files that I would like to flatten, I am looking for a generic function or logic to convert the xml to a flat file. Most of the answers include hard-coded tags. Closest one being Python : Flatten xml to csv with parent tag repeated in child but still has hard-coded solution. For below input xml
<root>
<child> child-val </child>
<child2> child2-val2 </child2>
<anotherchild>
<childid> another child 45</childid>
<childname> another child name </childname>
</anotherchild>
<group>
<groupid> groupid-123</groupid>
<grouplist>
<groupzone>
<groupname>first </groupname>
<groupsize> 4</groupsize>
</groupzone>
<groupzone>
<groupname>second </groupname>
<groupsize> 6</groupsize>
</groupzone>
<groupzone>
<groupname> third </groupname>
<groupsize> 8 </groupsize>
</groupzone>
</grouplist>
</group>
<secondgroup>
<secondgroupid> secondgroupid-42 </secondgroupid>
<secondgrouptitle> second group title </secondgrouptitle>
<secondgrouplist>
<secondgroupzone>
<secondgroupsub>
<secondsub>v1</secondsub>
<secondsubid>12</secondsubid>
</secondgroupsub>
<secondgroupname> third </secondgroupname>
<secondgroupsize> 4</secondgroupsize>
</secondgroupzone>
<secondgroupzone>
<secondgroupsub>
<secondsub>v2</secondsub>
<secondsubid>1</secondsubid>
</secondgroupsub>
<secondgroupname>fourth </secondgroupname>
<secondgroupsize> 6</secondgroupsize>
</secondgroupzone>
<secondgroupzone>
<secondgroupsub>
<secondsub>v3</secondsub>
<secondsubid>45</secondsubid>
</secondgroupsub>
<secondgroupname> tenth </secondgroupname>
<secondgroupsize> 10 </secondgroupsize>
</secondgroupzone>
</secondgrouplist>
</secondgroup>
<child3> val3 </child3>
</root>
I tried using this package pandas-read-xml got most of the values but the anotherchild tag values are showing up in one column(anotherchild) instead of anotherchild|childid and anotherchild|anotherchild. If possible suggest a generic logic to convert an xml to flat file.
import pandas_read_xml as pdx
df = pdx.read_xml(xml_content, ['root'])
fully_fatten_df = pdx.fully_flatten(df)
fully_fatten_df.to_csv("stack.csv", index=False)
Output csv
anotherchild,child,child2,child3,group|groupzone|groupname,group|groupzone|groupsize,secondgroup|secondgroupzone|secondgroupname,secondgroup|secondgroupzone|secondgroupsize,secondgroup|secondgroupzone|secondgroupsub|secondsub,secondgroup|secondgroupzone|secondgroupsub|secondsubid
,child-val,child2-val2,val3,,,third,4,v1,12
,child-val,child2-val2,val3,,,fourth,6,v2,1
,child-val,child2-val2,val3,,,tenth,10,v3,45
,child-val,child2-val2,val3,first,4,,,,
,child-val,child2-val2,val3,second,6,,,,
,child-val,child2-val2,val3,third,8,,,,
another child 45,child-val,child2-val2,val3,,,,,,
another child name,child-val,child2-val2,val3,,,,,,
,child-val,child2-val2,val3,,,,,,
,child-val,child2-val2,val3,,,,,,
,child-val,child2-val2,val3,,,,,,