1

I am new to the XML, Is there any efficient way to match text using pandas data frame and update XML file ?

This is a small part of my large XML file which still follows the appropriate format.

XML file (input.xml):

<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
   <design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
      <sec name="abcd" sound_freq="abcd" c_ty="pv">
         <feature number="48">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
      </sec>
      <sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
         <feature number="5748">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
        </mwan>
      </sec>
      <sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
         <feature number="5748">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
            <xyz abc="trt" id="abc" />
            <per fre="acc" value="abc" />
            <per fre="xyz" value="abc" />
            <per fre="yy" value="abc" />
         </mwan>
      </sec>
      #file continue....
   </design>
</brand>

Data frame (to use as input):

                name       Volum_5mb      Volum_40mb     Volum_70mb
1     M_20_K40745170         89.00           44.00         77.00
2     M_20_K40745171         77.00           65.00         94.00

I would like to match elements from name column and if match then use rest of column to make new attribute as below. For example, if elements (M_20_K40745170) from df['name'] is present/matched then update the corresponding node with following lines respectively in the output file.

<per fre="Volum_5mb" value="89.00"/>
<per fre="Volum_40mb" value="44.00"/>
<per fre="Volum_70mb" value="77.00"/>

and so on.

I want the output file to looks like

Desired XML (output.xml):

<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
   <design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
      <sec name="abcd" sound_freq="abcd" c_ty="pv">
         <feature number="48">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
      </sec>
      <sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
         <feature number="5748">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
            <per fre="Volum_5mb" value="89.00" />
            #new attribute FYI
            <per fre="Volum_40mb" value="44.00" />
            #new attribute FYI
            <per fre="Volum_70mb" value="77.00" />
            #new attribute FYI
         </mwan>
      </sec>
      <sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
         <feature number="5748">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
            <xyz abc="trt" id="abc" />
            <per fre="acc" value="abc" />
            <per fre="xyz" value="abc" />
            <per fre="yy" value="abc" />
            <per fre="Volum_5mb" value="77.00" />
            #new attribute FYI
            <per fre="Volum_40mb" value="65.00" />
            #new attribute FYI
            <per fre="Volum_70mb" value="94.00" />
            #new attribute FYI
         </mwan>
      </sec>
      #file continue....
   </design>
</brand>

I am trying etree.ElementTree module

 import xml.etree.ElementTree as ET
tree = ET.parse('input.xml')
root = tree.getroot()
for i in range(len(df)):
    for node in tree.findall("./design/sec"):
        name = node.attrib.get('name')
        if  name == df.loc[i, 'name']:
            print(name)





        

I am new to this Python-XML coding. I dont have any idea how to add new attributes in a XML file by using pandas data frame. Please help. Thanks and Regards.

3
  • first learn how to use xlm.etree and how to Modifying an XML File because your main problem has nothing to do directly with pandas. How about node.set("value", "89.00") Commented Sep 15, 2020 at 9:12
  • you could first find all nodes and later use them with df - this way you would search every node only once. In current code you search the same node many times. Commented Sep 15, 2020 at 9:18
  • BTW: node = tree.findall('./design/sec[@name="M_20_K40745170"]') and later you can do node.find('./per[@fre="Volum_5mb"]\) Commented Sep 15, 2020 at 9:20

1 Answer 1

1

You could learn xml and xpath because main problem has nothing do to with pandas but xml.

You can use more complex xpath to find node with name M_20_K40745170 and subnode mwam in which you will have to search pre and update it (or even add new pre)

node = root.find('./design/sec[@name="M_20_K40745170"]//mwan')

You can use df.iterrows() for this

for index, row in df.iterrows():
    node = root.find('./design/sec[@name="{}"]//mwan'.format(row['name']))

And later you can search per with "Volum_5mb"

item = node.find('./per[@fre="Volum_5mb"]')

and create new one and/or update value

if not item:  # if item is None:
    item = ET.SubElement(node, 'per')
    item.set('fre', "Volum_5mb")

item.set('value', str(row['Volum_5mb']))

And you can use list ['Volum_5mb', 'Volum_40mb', 'Volum_70mb'] for this

for fre in ['Volum_5mb', 'Volum_40mb', 'Volum_70mb']:

    item = node.find('./per[@fre="{}"]'.format(fre))
    #print(fre, item)

    if not item:
        item = ET.SubElement(node, 'per')
        item.set('fre', fre)

    item.set('value', str(row[fre]))

Minimal working code with example data directly in code but you should read them from file.

text = '''                name       Volum_5mb      Volum_40mb     Volum_70mb
1     M_20_K40745170         89.00           44.00         77.00
2     M_20_K40745171         77.00           65.00         94.00'''

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" Insti="TGA">
   <design name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
      <sec name="abcd" sound_freq="abcd" c_ty="pv">
         <feature number="48">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="abcd" first_name="g7tty" description="xyz" />
      </sec>
      <sec name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" tension="SGCGSCGSCGSCGSC" s_c="0">
         <feature number="5748">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
         </mwan>
      </sec>
      <sec name="M_20_K40745171" sound_freq="mhr17:7907528-7907599" tension="SGCGSCGSCGSHHGSC" s_c="0">
         <feature number="5748">
            <tfgt v="0.1466469683747654" y="0.0" units="sec" />
         </feature>
         <mwan sound_freq="mhr17:7907527-7907589" first_name="gtftty" description="xyz">
            <xyz abc="trt" id="abc" />
            <per fre="acc" value="abc" />
            <per fre="xyz" value="abc" />
            <per fre="yy" value="abc" />
         </mwan>
      </sec>
   </design>
</brand>'''

import pandas as pd
import io
import xml.etree.ElementTree as ET

#df = pd.read_csv('input.csv')
df = pd.read_csv(io.StringIO(text), sep='\s+')
#print(df)

#tree = ET.('input.xml')
#root = tree.getroot()
root = ET.fromstring(xml)
tree = ET.ElementTree(root)

for index, row in df.iterrows():
    node = root.find('./design/sec[@name="{}"]//mwan'.format(row['name']))
    
    for fre in ['Volum_5mb', 'Volum_40mb', 'Volum_70mb']:

        item = node.find('./per[@fre="{}"]'.format(fre))
        #print('item:', fre, '=', item)

        if not item:
            #print('new', item, fre)
            item = ET.SubElement(node, 'per')
            #item.tail = '\n         '  # to pretty print
            item.set('fre', fre)

        item.set('value', str(row[fre]))

    #print(ET.tostring(node).decode())
    
#---
    
print( ET.tostring(root) )
#tree.write('output.xml')

Doc: Modifying an XML File

Sign up to request clarification or add additional context in comments.

1 Comment

you are adding and pre as new attribute instead of per since you are searching/ checking with pre with "Volum_5mb" anyway you will get Nonebtw thanks for the explanation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.