How to fetch content of XML root element in Python?

Question

I have an XML file, e.g.:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    First line. <br/> Second line.
</root>

As an output I want to get: '\nFirst line. <br/> Second line.\n' I just want to notice, if the root element contains other nested elements, they should be returned as is.

So you just want to strip off the start and end tags of the root element? — mzjn
– mzjn, Commented Jul 12, 2011 at 20:12
Basically, yes. But I need general-purpose approach. I mean that XML could be not exactly the same, e.g. it can contain <!DOCTYPE> declaration, etc. — sam
– sam, Commented Jul 13, 2011 at 14:13
Do you want the parsed content of the root element (which might include expanded entities for example), or do you simply want the verbatim string between the start and end tags? — mzjn
– mzjn, Commented Jul 13, 2011 at 15:45

Anton Moiseev · Accepted Answer · 2011-07-14 10:48:56Z

3

The first that I came up with:

from xml.etree.ElementTree import fromstring, tostring

source = '''<?xml version="1.0" encoding="UTF-8"?>
<root>
    First line.<br/>Second line.
</root>
'''

xml = fromstring(source)
result = tostring(xml).lstrip('<%s>' % xml.tag).rstrip('</%s>' % xml.tag)

print result

# output:
#
#   First line.<br/>Second line. 
#

But it's not truly general-purpose approach since it fails if opening root element (<root>) contains any attribute.

UPDATE: This approach has another issue. Since lstrip and rstrip match any combination of given chars, you can face such problem:

# input:
<?xml version="1.0" encoding="UTF-8"?><root><p>First line</p></root>

# result:
p>First line</p

If your really need only literal string between the opening and closing tags (as you mentioned in the comment), you can use this:

from string import index, rindex
from xml.etree.ElementTree import fromstring, tostring

source = '''<?xml version="1.0" encoding="UTF-8"?>
<root attr1="val1">
    First line.<br/>Second line.
</root>
'''

# following two lines are needed just to cut
# declaration, doctypes, etc.
xml = fromstring(source)
xml_str = tostring(xml)

start = index(xml_str, '>')
end = rindex(xml_str, '<')

result = xml_str[start + 1 : -(len(xml_str) - end)]

Not the most elegant approach, but unlike the previous one it works correctly with attributes within opening tag as well as with any valid xml document.

edited Jul 14, 2011 at 10:48

answered Jul 13, 2011 at 15:56

Anton Moiseev

2,8744 gold badges26 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mzjn Over a year ago

xml_str = tostring() should be xml_str = tostring(xml).

mzjn Over a year ago

Namespaces can mess things up. For example, if the root element has a xmlns="http://foo.com" declaration, your solution does not quite work.

infrared · Accepted Answer · 2011-07-12 19:03:12Z

0

Parse from file:

from xml.etree.ElementTree import parse
tree = parse('yourxmlfile.xml')
print tree.getroot().text

Parse from string:

from xml.etree.ElementTree import fromstring
print fromstring(yourxmlstr).text

edited Jul 12, 2011 at 19:03

answered Jul 12, 2011 at 18:52

infrared

3,6262 gold badges27 silver badges37 bronze badges

5 Comments

sam Over a year ago

Thanks. But how can I parse XML from string, not from file?

Santa Over a year ago

That would return '\n First line. ', not what the OP wanted.

Santa Over a year ago

@asm from string: use xml.etree.ElementTree.fromstring.

sam Over a year ago

@Santa Unfortunately xml.etree.ElementTree.fromstring returns Element, not ElementTree. So it doesn't contain getroot() method.

sam Over a year ago

@Santa You are right. But unfortunately you are also right with your first comment :). text attribute returns only \n First line.

Collectives™ on Stack Overflow

How to fetch content of XML root element in Python?

2 Answers 2

2 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related