0

I'm coding a script to extract information from several XML files with the same structure but with missing sections when there is no information related to a tag. The easiest way to achieve this was using try/except so instead of getting a "AtributeError: 'NoneType' object has no atrribute 'find'" I assign an empty string('') to the object in the exeption. Something like this:

try:
   string1=root.find('value1').find('value2').find('value3').text
except:
   string1=''

The issue is that I want to shrink my code by using a function:

def extract(string):
    tempstr=''
    try:
        tempstr=string.replace("\n", "")
    except:
        if tempstr is None:
            tempstr=""
    return string

And then I try to called it like this:

string1=extract(root.find('value1').find('value2').find('value3').text)

and value2 or value3 does not exist for the xml that is being processed, I get and AttributeError even if I don't use the variable in the function making the function useless.

Is there a way to make a function work, maybe there is a way to make it run without checking if the value entered is invalid?

Solution:

I'm using a mix of both answers:

def extract(root, xpath):
    tempstr=''
    try:
        tempstr=root.findall(xpath)[0].text.replace("\n", "")
    except:
        tempstr=''#To avoid getting a Nonetype object
    return tempstr

2 Answers 2

1

You can try something like that:

def extract(root, children_keys: list):
    target_object = root
    result_text = ''
    try:
        for child_key in children_keys:
            target_object = target_object.find(child_key)
        result_text = target_object.text
    except:
        pass

    return result_text

You will go deeper at XML structure with for loop (children_keys - is predefined by you list of nested keys of XML - xml-path to your object). And if error will throw inside that code - you will get '' as result.

Example XML (source):

<?xml version="1.0" encoding="UTF-8"?>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>
    <y>Don't forget me this weekend!</y>
  </body>
</note>

Example:

import xml.etree.ElementTree as ET
tree = ET.parse('note.xml')
root = tree.getroot()
children_keys = ['body', 'y']
result_string = extract(root, children_keys)
print(result_string)

Output:

"Don't forget me this weekend!"
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks this is a great solution sadly I'm working with hundres of thousands of xml's so I'm avoiding iterations as much as possible. I think that the other solution while incomplete is the best approach. I gave you the right answer because is well presented and easy to understand for beginners like me. At the end I will use the solution on the update of my question.
1

Use XPATH expression

import xml.etree.ElementTree as ET

xml1 = '''<r><v1><v2><v3>a string</v3></v2></v1></r>'''
root = ET.fromstring(xml1)
v3 = root.findall('./v1/v2/v3')
if v3:
  print(v3[0].text)
else:
  print('v3 not found')


xml2 = '''<r><v1><v3>a string</v3></v1></r>'''
root = ET.fromstring(xml2)
v3 = root.findall('./v1/v2/v3')
if v3:
  print(v3[0].text)
else:
  print('v3 not found')

output

a string
v3 not found

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.