Parsing XML Python

Question

I am using xml.etree.ElementTree to parse an XML file. I have a problem. I do not know how to obtain a plain text line between tags.

<Sync time="4.496"/>
<Background time="4.496" type="music" level="high"/>

<Event desc="pause" type="noise" extent="instantaneous"/>
Plain text
<Sync time="7.186"/>

<Event desc="b" type="noise" extent="instantaneous"/>
Plain text
<Sync time="10.949"/>
Plain text

I have this code already:

import xml.etree.ElementTree as etree
import os

data_file = "./file.xml"

xmlD = etree.parse(data_file)
root = xmlD.getroot()
sections = root.getchildren()[2].getchildren()
for section in sections:
    turns = section.getchildren()
    for turn in turns:
        speaker = turn.get('speaker')
    mode = turn.get('mode')
    childs = turn.getchildren()

        for child in childs:
            time = child.get('time')
            opt = child.get('desc')
            if opt == 'es':
                 opt = "ESP:"
            elif opt == "la":
                 opt = "LATIN:"
            elif opt == "*":
                 opt = "-ININT-"
            elif opt == "fs":
                 opt = "-FS-"
            elif opt == "throat":
                 opt = "-THROAT-"
            elif opt == "laugh":
                 opt = "-LAUGH-"
            else:
                 opt = ""

            print speaker, mode, time, opt+child.tail.encode('latin-1')

I can access through the XML until the Sync|Background|Event tag, and can't extract the text after these tags. I put a piece of the XML file, no the entire file. I only have problems with the final piece of code

Thank you so much @alecxe . Now I can get the info that I needed. But now I have a new little problem. I obtain the line typing the tail command but a newline character \n is generated before or something similar, so, I need something like: spk1 planned LAN: Plain text from tail>

But I get this:

spk1 planned LAN: Plain text from tail

I have tried many things, re.match() module, sed commands after processing the XML, but it seems there is no \n new line character, but I can't "put up" the plain text! Thank you in advance

Anyone? Thank you!

alecxe · Accepted Answer · 2015-05-11 11:27:10Z

3

This is called a tail of an element:

The tail attribute can be used to hold additional data associated with the element. This attribute is usually a string but may be any application-specific object. If the element is created from an XML file the attribute will contain any text found after the element’s end tag and before the next tag.

Locate the Event tag and get the tail, example:

section.find("Event").tail

answered May 11, 2015 at 11:27

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Parsing XML Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related