xml.etree.ElementTree.ParseError issue when trying to extract data from XML using PY3

Question

I am having an issue trying to extract the email from a xml file using Python3.

My code is:

import xml.etree.ElementTree as ET
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

data = '''<row>
    <row _id="row-jyi7-56ru_b7km" _uuid="00000000-0000-0000-B614-7FFDD7C1595B" _position="0" _address="https://www.dati.lombardia.it/resource/zzzz-zzzz/row-jyi7-56ru_b7km">
        <codice_regionale>MI1604</codice_regionale>
        <denom_farmacia>Farmacia Varesina</denom_farmacia>
        <indirizzo>VIA VARESINA, 121</indirizzo>
        <localita>Milano</localita>
        <telefono>3480813398</telefono>
        <email>[email protected]</email>
        <caratterizzazione>urbana</caratterizzazione>
        <esenzioni>true</esenzioni>
        <location latitude="45.500881" longitude="9.141339"/>
</row>'''

tree = ET.fromstring(data) #standard ET
results = tree.findall('email') #find the count section in xml

print(results.text)

The error I get is

Traceback (most recent call last):
  File "farmacie.py", line 25, in <module>
    tree = ET.fromstring(data) #standard ET
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/xml/etree/ElementTree.py", line 1321, in XML
    return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 12, column 6

How can I solve this?

You're missing a closing </row> or that extra <row> at the start isn't supposed to be there — Alain T.
– Alain T., Commented Mar 20, 2020 at 12:24

cwalvoort · Accepted Answer · 2020-03-20 13:12:22Z

1

So it looks like you have the row element defined twice (or you are missing the extra end tag), which is causing one issue. The next is that findall() will return a list, so you would need to pick one, or print them all out:

import xml.etree.ElementTree as ET

data = '''<row _id="row-jyi7-56ru_b7km" _uuid="00000000-0000-0000-B614-7FFDD7C1595B" _position="0" _address="https://www.dati.lombardia.it/resource/zzzz-zzzz/row-jyi7-56ru_b7km">
        <codice_regionale>MI1604</codice_regionale>
        <denom_farmacia>Farmacia Varesina</denom_farmacia>
        <indirizzo>VIA VARESINA, 121</indirizzo>
        <localita>Milano</localita>
        <telefono>3480813398</telefono>
        <email>[email protected]</email>
        <caratterizzazione>urbana</caratterizzazione>
        <esenzioni>true</esenzioni>
        <location latitude="45.500881" longitude="9.141339"/>
</row>'''

tree = ET.fromstring(data) #standard ET
results = tree.findall('email') #find the count section in xml

print(results[0].text)

Or:

for r in results:
    print(r.text)

Update:

After getting the full dataset, the correct way to get all of the emails would be:

import xml.etree.ElementTree as ET
import requests

data = requests.get('https://www.dati.lombardia.it/api/views/5dq5-xs9z/rows.xml').content

tree = ET.fromstring(data)
results = tree.findall("./row/row/email")

for r in results:
    print(r.text)

Results (2,684 rows):

[email protected]
[email protected]
[email protected]
[email protected]
...

edited Mar 20, 2020 at 13:12

answered Mar 20, 2020 at 12:25

cwalvoort

1,9551 gold badge19 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Filippo Ugo Over a year ago

Thanks, now it works. However when I try to extend the process to a bigger xml (dati.lombardia.it/api/views/5dq5-xs9z/rows.xml) it still does not work. Any suggestions?

cwalvoort Over a year ago

From the dataset you linked, it looks like you might be looking for tree.findall("./row/row/email"). That will pull all the email elements from the entire set.

cwalvoort Over a year ago

docs.python.org/2/library/xml.etree.elementtree.html#example

Filippo Ugo Over a year ago

Thank you very much. However I am still having issues, if I try to insert the entire dataset I still get an issue (xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 401, column 60). How should I solve this? Should I import the data through link or copy pasting it in the data variable? Thank you again for the help!!!

Collectives™ on Stack Overflow

xml.etree.ElementTree.ParseError issue when trying to extract data from XML using PY3

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related