0

I need to parse xml file and find a values only starts with "123". How i can do this using this code below? It is possible to use regex inside this syntax?

import xml.etree.ElementTree as ET
parse = ET.parse('xml.xml')
print([ events.text for record in parse.findall('.configuration/system/') for events in record.findall('events')])

xml.xml

<rpc-reply>
 <configuration>
        <system>
            <preference>
                <events>123</events>
                <events>124</events>
                <events>1235</events>                    
            </preference>
        </system>
 </configuration>
</rpc-reply>
3
  • For regex, it would be probably the best to operate on un-parsed (string) xml - just get re.findall(r'>(123[^<]*)<', xml_string) - gets this: ['123', '1235']. Will get anything between > and < that starts with 123 and consists of any characters that are not < (because it should end the search, obviously). Commented Nov 7, 2019 at 8:06
  • @h4z3 yeah but, what if i only find those values inside <preference> tag or another? Commented Nov 7, 2019 at 8:17
  • Read about XPath use in ET.findall. XPath can do such filtering but you need to test what exactly you want. (Just play with it in the interactive console. I do that when I need something specific.) Commented Nov 7, 2019 at 8:24

1 Answer 1

1

XPath predicate can do that much using built-in function starts-with(). But you need to use library that fully support XPath 1.0 such as lxml:

from lxml import etree as ET
raw = '''<rpc-reply>
 <configuration>
        <system>
            <preference>
                <events>123</events>
                <events>124</events>
                <events>1235</events>                    
            </preference>
        </system>
 </configuration>
</rpc-reply>'''
root = ET.fromstring(raw)
query = 'configuration/system/preference/events[starts-with(.,"123")]'
print([events.text for events in root.xpath(query)])

If you still want to use regex, lxml supports regex despite XPath 1.0 specification does not include regex (see: Regex in lxml for python).

xml.etree only supports limited subset of XPath 1.0 expression, which does not include starts-with function (and definitely does not support regex). So you need to rely on python string function to check that:

....
query = 'configuration/system/preference/events'
print([events.text for events in root.findall(query) if events.text.startswith('123')])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.