I am using xml.etree.ElementTree to parse an XML file. I have a problem. I do not know how to obtain a plain text line between tags.
<Sync time="4.496"/>
<Background time="4.496" type="music" level="high"/>
<Event desc="pause" type="noise" extent="instantaneous"/>
Plain text
<Sync time="7.186"/>
<Event desc="b" type="noise" extent="instantaneous"/>
Plain text
<Sync time="10.949"/>
Plain text
I have this code already:
import xml.etree.ElementTree as etree
import os
data_file = "./file.xml"
xmlD = etree.parse(data_file)
root = xmlD.getroot()
sections = root.getchildren()[2].getchildren()
for section in sections:
turns = section.getchildren()
for turn in turns:
speaker = turn.get('speaker')
mode = turn.get('mode')
childs = turn.getchildren()
for child in childs:
time = child.get('time')
opt = child.get('desc')
if opt == 'es':
opt = "ESP:"
elif opt == "la":
opt = "LATIN:"
elif opt == "*":
opt = "-ININT-"
elif opt == "fs":
opt = "-FS-"
elif opt == "throat":
opt = "-THROAT-"
elif opt == "laugh":
opt = "-LAUGH-"
else:
opt = ""
print speaker, mode, time, opt+child.tail.encode('latin-1')
I can access through the XML until the Sync|Background|Event tag, and can't extract the text after these tags. I put a piece of the XML file, no the entire file. I only have problems with the final piece of code
Thank you so much @alecxe . Now I can get the info that I needed. But now I have a new little problem. I obtain the line typing the tail command but a newline character \n is generated before or something similar, so, I need something like:
spk1 planned LAN: Plain text from tail>
But I get this:
spk1 planned LAN:
Plain text from tail
I have tried many things, re.match() module, sed commands after processing the XML, but it seems there is no \n new line character, but I can't "put up" the plain text! Thank you in advance
Anyone? Thank you!