0

working on XML, for which I will have to loop through and compare the values before or afterwords.

<TRANS DESCRIPTION ="" NAME ="EXPRR" >
            <FIELD EXPR ="A1" NAME ="SD" PORTTYPE ="INPUT/OUTPUT"/>
            <FIELD EXPR ="V" NAME ="DDS" PORTTYPE ="VARIABLE"/>
            <FIELD EXPR ="C" NAME ="SSS" PORTTYPE ="OUTPUT"/>
            <FIELD EXPR ="SD" NAME ="SS" PORTTYPE ="VARIABLE"/>
            <FIELD EXPR ="XX" NAME ="EEEE" PORTTYPE ="INPUT/OUTPUT"/>
</TRANS> 

I would like to put this in the temp memory where I can look through the values and add a sequence. for ex.

seq key value

1 A1 SD
2 V DDS
3 C SSS
4 SD SSS
5 XX EEEE

Once I have this I will have to compare if value exists in the below rows. For example SD exists in below row. so on.

Is there any data structure I can use to perform this operation in Python 3 ?.

2 Answers 2

1

ONE WAY:

import xml.etree.ElementTree as ET
import xmltodict
import pandas as pd

tree = ET.parse('<your xml file path here>')
xml_data = tree.getroot()
# here you can change the encoding type to be able to set it to the one you need
xmlstr = ET.tostring(xml_data, encoding='utf-8', method='xml')

data_dict = dict(xmltodict.parse(xmlstr))
df = pd.DataFrame(data_dict['TRANS']['FIELD']).drop('@PORTTYPE', 1)
print(df)

OUTPUT:

  @EXPR @NAME
0    A1    SD
1     V   DDS
2     C   SSS
3    SD    SS
4    XX  EEEE
Sign up to request clarification or add additional context in comments.

Comments

0

You could use collections.defaultdict to collate your data before creating a dataframe :

data = """<TRANS DESCRIPTION ="" NAME ="EXPRR" >
            <FIELD EXPR ="A1" NAME ="SD" PORTTYPE ="INPUT/OUTPUT"/>
            <FIELD EXPR ="V" NAME ="DDS" PORTTYPE ="VARIABLE"/>
            <FIELD EXPR ="C" NAME ="SSS" PORTTYPE ="OUTPUT"/>
            <FIELD EXPR ="SD" NAME ="SS" PORTTYPE ="VARIABLE"/>
            <FIELD EXPR ="XX" NAME ="EEEE" PORTTYPE ="INPUT/OUTPUT"/>
          </TRANS> 
       """

import xml.etree.ElementTree as ET root = ET.fromstring(data)

from collections import defaultdict


collection = defaultdict(list)

for child in root:
    collection['key'].append(child.attrib['EXPR'])
    collection['value'].append(child.attrib['NAME'])

pd.DataFrame(collection).rename_axis('seq')
 
    key value
seq          
0    A1    SD
1     V   DDS
2     C   SSS
3    SD    SS
4    XX  EEEE

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.