I have the following XML file, which I have to parse and extract data from it in a csv file. In this file I have two boxes (box_id), which are packed on two different parent objects (parent_box_id) and there are also the details of the content of each of the boxes (element sgtin -> info_sgtin).
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<doc>
<info id_reference="2">
<data_down>
<tree>
<box_id>046071598600870568</box_id>
<parent_box_id>046071598600875594</parent_box_id>
</tree>
<tree>
<box_id>046071598600870575</box_id>
<parent_box_id>046071598600875595</parent_box_id>
</tree>
<tree>
<sgtin>
<info_sgtin>
<sgtin>04607008133585B0SE1HVHBGR3A</sgtin>
<box_id>046071598600870568</box_id>
<gtin>04607008133585</gtin>
<series_number>026A</series_number>
</info_sgtin>
</sgtin>
<parent_box_id>046071598600870568</parent_box_id>
</tree>
<tree>
<sgtin>
<info_sgtin>
<sgtin>046070081335856F7P78HBVBEH2</sgtin>
<box_id>046071598600870568</box_id>
<gtin>04607008133585</gtin>
<series_number>026A</series_number>
</info_sgtin>
</sgtin>
<parent_box_id>046071598600870568</parent_box_id>
</tree>
<tree>
<sgtin>
<info_sgtin>
<sgtin>046070081335854T61H7CSXDE9W</sgtin>
<box_id>046071598600870575</box_id>
<gtin>04607008133585</gtin>
<series_number>026A</series_number>
</info_sgtin>
</sgtin>
<parent_box_id>046071598600870575</parent_box_id>
</tree>
</data_down>
</info>
</doc>
For this purpose I decided to use Elementtree in Python, but the problem is that in my XML file I have two variants of tag.
First of all I iterate through all the details and capture the box_id value, but after that I have to go to parent item and get the parent_box_id in which this box_id is packed.
In other words I want to get the data in the following way:
parent_box_id box_id sgtin series_number
046071598600875594 046071598600870568 04607008133585B0SE1HVHBGR3A 026A
046071598600875594 046071598600870568 046070081335856F7P78HBVBEH2 026A
046071598600875595 046071598600870575 046070081335854T61H7CSXDE9W 026A
But I can't figure out how to get parent_box_id value. Would appreciate any support from the community.
Here is the code that I have:
import csv
import xml.etree.ElementTree as ET
csv.writer(open('result.csv','w'),delimiter=';', quotechar='"', quoting=csv.QUOTE_MINIMAL))
tree = ET.parse('test.xml')
root = tree.getroot()
with open('result.csv','a',newline='') as myfile:
writer = csv.writer(myfile, delimiter=';', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for alist in root.iter('info_sgtin'):
sgtin = alist.find('sgtin').text
box_id = alist.find('box_id').text
series = alist.find('series_number').text
writer.writerow([sgtin,box_id,series])
parent_box_idneed to by matched withbox_idinside first 2treeand rest of data?