I am getting trouble with the following matter.Let's say, I have some string in two list in a dictionary:
left right
british 7
cuneate nucleus Medulla oblongata
Motoneurons anterior
And I have some test lines in a file as like below:
<s id="69-7">British Meanwhile is the studio 7 album by british pop band 10cc 7.</s>
<s id="5239778-2">Medulla oblongata,the name refers collectively to the cuneate nucleus and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>
<s id="21120-99">Terior horn cells, motoneurons located in the spinal.</s>
I want to get output as like following way:
<s id="69-7"><w2>British</w2> Meanwhile is the studio <w2>7</w2> album by <w1>british</w1> pop band 10cc <w2>7</w2>.</s>
<s id="5239778-2"><w2>Medulla oblongata</w2>,the name refers collectively to the <w1>cuneate nucleus</w1> and gracile nucleus, which are present at the junction between the spinal cord and the <w2>medulla oblongata</w2>.</s>
I tried with the following code:
import re
def textReturn(left, right):
text = ""
filetext = open(text.xml, "r").read()
linelist = re.split(u'[\n|\r\n]+',filetext)
for i in linelist:
left = left.strip()
right = right.strip()
if left in i and right in i:
i1 = re.sub('(?i)(\s+)(%s)(\s+)'%left, '\\1<w1>\\2</w1>\\3', i)
i2 = re.sub('(?i)(\s+)(%s)(\s+)'%right, '\\1<w2>\\2</w2>\\3', i1)
text = text + i2 + "\n"
return text
But it gives me:
'<s id="69-7">British meanwhile is the studio <w2>7</w2> album by <w1>British</w1> pop band 10cc 7.</s>'.
<s id="5239778-2">Medulla oblongata,the name refers collectively to the <w1>cuneate nucleus</w1> and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>
<s id="21120-99">Terior horn cells, <w1>motoneurons</w2> located in the spinal.</s>
i.e It can't tag if there are string at the beginning & end .
Also,I just want to get return those line ,which matches both left & right strings, NOT others line.
Any solution please! Thanks a lot!!!
selement to be on a single line. You can only get away with finding elements yourself if you take into account literal strings, CDATA sections, processing directives, etc. but why would you want to when xml parsers do that for you already? There is a learning curve to using them, as well as XSLT (for modifying the docs the way you want to) but it is sooooooo worth it!