1

I want to print data from an XML file. To do that, I created a dictionary to parse the file. Finally, I used a for loop to print the data in a new DataFrame.

<BREVIER>
  <BRV>
    <MONO>stuff</MONO>
    <TITD>stuff</TITD>
    <TITF>Blabla</TITF>
    <CMPD>stuff</CMPD>
    <CMPF>stuff</CMPF>
    <INDD>stuff</INDD>
    <INDF>Blablo</INDF>
    <CINDD>stuff</CINDD>
    <CINDF>stuff</CINDF>
    <POSD>stuff</POSD>
    <POSF>stuff</POSF>
    <DEL>true</DEL>
  </BRV>

and so on with many, many BRV categories.

The output I expect to have:

Nom_du_medicament    Indication
Blabla               Blablo

I tried this code:

# encoding: utf-8

import xmltodict
import pprint
import json
import pandas as pd

with open('Brevier.xml',encoding='UTF-8','rb') as fd:
    my_dict = xmltodict.parse(fd.read(),encoding='UTF-8')
    tableau_indic=pd.DataFrame()
    for section in my_dict ['BREVIER']['BRV']:
        drugname = section.get('TITF')
        print(drugname in tableau_indic.loc(["Nom_du_medicament"]))
        drugindication = section.get('INDF')
        print(drugindication in tableau_indic.loc(["Indication"]))
print(tableau_indic)
fd.close()

I am getting a type error TypeError: unhashable type: 'list'

Since it did not work, here is the second method I tried using .loc:

# encoding: utf-8
import xmltodict
import pprint
import json
import pandas as pd

with open('Brevier.xml',encoding='UTF-8') as fd:
    my_dict = xmltodict.parse(fd.read(),encoding='UTF-8')
    tableau_indic=pd.DataFrame
    for section in my_dict ['BREVIER']['BRV']:
        drugname = section.get('TITF')
        print(tableau_indic.loc["Nom_du_medicament"])
        drugindication = section.get('INDF')
        print(tableau_indic.loc["Indication"])
print(tableau_indic)
fd.close()

This time I had the KeyError: 'Nom_du_medicament' error.

Is there a way to avoid these errors?

4
  • Why bother with all these conversions instead of printing directly from the xml file? Commented Oct 1, 2020 at 16:36
  • I am a beginner and that was the most logical way that I've imagined. Commented Oct 1, 2020 at 16:41
  • I would suggest you edit your question with a representative sample of the xml and the expected output from that sample and we can see what can be done with it. Commented Oct 1, 2020 at 16:43
  • Ok, I have added a representative sample of the xml and the expected output from that sample Commented Oct 1, 2020 at 16:52

1 Answer 1

2

There are several ways to approach it, but basically, since you are dealing with an xml file, might as well use xml tools like xpath.

Let's say your xml looks like this:

meds = """<BREVIER>
  <BRV>
    <MONO>stuff</MONO>
    <TITF>Blabla</TITF>
    <CMPD>stuff</CMPD>
    <INDF>Blablo</INDF>
    <CINDD>stuff</CINDD>
    <DEL>true</DEL>
  </BRV>
  <BRV>
    <MONO>stuff</MONO>
    <TITF>Blabla 2</TITF>
    <CMPD>stuff</CMPD>
    <INDF>Blablo 2</INDF>
    <CINDD>stuff</CINDD>
    <DEL>true</DEL>
  </BRV>
</BREVIER>"""

You can use lxml to process it:

from lxml import etree
doc = etree.XML(meds)
print('Nom_du_medicament Indication')
for m in doc.xpath('//BRV'):
      print(m.xpath('./TITF/text()')[0], m.xpath('./INDF/text()')[0])

Output:

Nom_du_medicament Indication
Blabla Blablo
Blabla 2 Blablo 2

From here, you can format the output, load it into a dataframe or whatever.

Sign up to request clarification or add additional context in comments.

2 Comments

Your code seems to run fine but I still have one last error related to the XML file: lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1 This is confusing because I checked my file and the first line is <Brevier> so normally there shouldn't be any errors.
@marou95thebest Your actual xml is probably different from the sample in my question, so the error indicates some problem with the formatting of the xml. Ocassionally, it can be solved by changing doc = etree.XML(meds) to doc = etree.fromstring(meds). If not, you have to make sure your actual xml is valid.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.