I am new to the XML, Is there any efficient way to match text using pandas data frame and update XML file ?
This is a small part of my large XML file which still follows the appropriate format.
XML file (input.xml):
<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
<design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
<sec name="abcd" sound_freq="abcd" c_ty="pv">
<feature number="48">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
</sec>
<sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
</mwan>
</sec>
<sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
<xyz abc="trt" id="abc" />
<per fre="acc" value="abc" />
<per fre="xyz" value="abc" />
<per fre="yy" value="abc" />
</mwan>
</sec>
#file continue....
</design>
</brand>
Data frame (to use as input):
name Volum_5mb Volum_40mb Volum_70mb
1 M_20_K40745170 89.00 44.00 77.00
2 M_20_K40745171 77.00 65.00 94.00
I would like to match elements from name column and if match then use rest of column to make new attribute as below. For example, if elements (M_20_K40745170) from df['name'] is present/matched then update the corresponding node with following lines respectively in the output file.
<per fre="Volum_5mb" value="89.00"/>
<per fre="Volum_40mb" value="44.00"/>
<per fre="Volum_70mb" value="77.00"/>
and so on.
I want the output file to looks like
Desired XML (output.xml):
<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
<design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
<sec name="abcd" sound_freq="abcd" c_ty="pv">
<feature number="48">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
</sec>
<sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
<per fre="Volum_5mb" value="89.00" />
#new attribute FYI
<per fre="Volum_40mb" value="44.00" />
#new attribute FYI
<per fre="Volum_70mb" value="77.00" />
#new attribute FYI
</mwan>
</sec>
<sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="sec" />
</feature>
<mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
<xyz abc="trt" id="abc" />
<per fre="acc" value="abc" />
<per fre="xyz" value="abc" />
<per fre="yy" value="abc" />
<per fre="Volum_5mb" value="77.00" />
#new attribute FYI
<per fre="Volum_40mb" value="65.00" />
#new attribute FYI
<per fre="Volum_70mb" value="94.00" />
#new attribute FYI
</mwan>
</sec>
#file continue....
</design>
</brand>
I am trying etree.ElementTree module
import xml.etree.ElementTree as ET
tree = ET.parse('input.xml')
root = tree.getroot()
for i in range(len(df)):
for node in tree.findall("./design/sec"):
name = node.attrib.get('name')
if name == df.loc[i, 'name']:
print(name)
I am new to this Python-XML coding. I dont have any idea how to add new attributes in a XML file by using pandas data frame. Please help. Thanks and Regards.
xlm.etreeand how to Modifying an XML File because your main problem has nothing to do directly withpandas. How aboutnode.set("value", "89.00")df- this way you would search every node only once. In current code you search the same node many times.node = tree.findall('./design/sec[@name="M_20_K40745170"]')and later you can donode.find('./per[@fre="Volum_5mb"]\)