How to get text inside specific tag using tag name with Python

Question

I'm trying to open an XML file and parse through it, looking through its tags and finding the text within each specific tag. If the text within the tag matches a string, I want it remove a part of the string or substitute it with something else.

My question is, I'm not sure if: start = x.find('start_char').text is actually getting the text inside "start_char" tag and saving it to the "start" variable. (Does "x.find('tag_name').text actually get the text inside the tag?)

The XML file has the following data:

<?xml version="1.0" encoding="utf-8"?>
<metadata>
    <filter>
        <regex>ATL|LAX|DFW</regex >
        <start_char>3</start_char>
        <end_char></end_char>
        <action>remove</action>
    </filter>
    <filter>
        <regex>DFW.+\.$</regex >
        <start_char>3</start_char>
        <end_char>-1</end_char>
        <action>remove</action>
    </filter>
    <filter>
        <regex>\-</regex >
        <replacement></replacement>
        <action>substitute</action>
    </filter>
    <filter>
        <regex>\s</regex >
        <replacement></replacement>
        <action>substitute</action>
    </filter>
    <filter>
        <regex> T&amp;R$</regex >
        <start_char></start_char>
        <end_char>-4</end_char>
        <action>remove</action>
    </filter>
</metadata>

The Python code I'm using is:

from xml.etree.ElementTree import ElementTree    

# filters.xml is the file that holds the things to be filtered
tree = ElementTree()
tree.parse("filters.xml")

# Get the data in the XML file 
root = tree.getroot()

# Loop through filters
for x in root.findall('filter'):

    # Find the text inside the regex tag
    regex = x.find('regex').text

    # Find the text inside the start_char tag
    start = x.find('start_char').text

    # Find the text inside the end_char tag
    end = x.find('end_char').text

    # Find the text inside the replacement tag
    #replace = x.find('replacement')

    # Find the text inside the action tag
    action = x.find('action').text

    if action == 'remove':
        if re.match(r'regex', mfn_pn, re.IGNORECASE):
            mfn_pn = mfn_pn[start:end]

    elif action == 'substitute':
        mfn_pn = re.sub(r'regex', '', mfn_pn)

    return mfn_pn

It would be a barcode inputted by the user, something similar to ATL-157-1815, DFW-184-8378. — Sophia
– Sophia, Commented Dec 17, 2020 at 14:22

Alexandra Dudkina · Accepted Answer · 2020-12-17 14:31:31Z

1

Code start = x.find('start_char').text will function in cases when filter element has start_char child, otherwise it will throw an error AttributeError: 'NoneType' object has no attribute 'text'.

This can be avoided e.g. using following approach:

# find element
start_el = x.find('start_char')
# check if element exist and assign its text to the variable, None (or another default value) otherwise
start = start_el.text if start_el is not None else None

Same applies to end variable.

Using this approach, following values will be retrieved for your example document:

3 None
3 -1
None None
None None
None -4

answered Dec 17, 2020 at 14:31

Alexandra Dudkina

4,5123 gold badges18 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Sophia Over a year ago

Awesome, thank you so much! Using "for x in root.findall('filter'):", is it actually looping through all the data in the XML file, or does it only look at the first "filter" tag?

Alexandra Dudkina Over a year ago

findall() searches for all filter elements and iterates over them.

Sophia Over a year ago

For some reason, it's not looping through all the filter elements for me. It only goes through what's in the first filter element and stops there.

Alexandra Dudkina Over a year ago

That probably happens because of the return statement inside the loop.

Sophia Over a year ago

I took the return statement out of the loop and placed it so it’s aligned with the for loop.

Collectives™ on Stack Overflow

How to get text inside specific tag using tag name with Python

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related