Part of the original text is like below and stored in a txt file. Html source code alike but incomplete.
<span style="cursor:pointer" onmousedown="HI466('1056').click()">Steffen Eddine (PhD) (SEED)</span></span></div><script>HI466("100256").checked=T</script><div id=“k62” style="left:95px;top:15px;width:32;height:25;"><span id="321" name="021"><span style="cursor:pointer" onmousedown="HI466('2321').click()">Petra Schmidt (PESC)</span></span></div><script>HI466("239021").checked=T</script><div id=“k62” style="left:65px;top:15px;width:32;height:25;"><span id="306" name="366"><span style="cursor:pointer" onmousedown="HI466('2366').click()">Peter Kumar (PEKU)</span></span></div><script>HI466("230866").checked=T</script><div id=“k62” style="left:25px;top:35px;width:32;height:25;"><span id="425" name="511"><span style="cursor:pointer" onmousedown="HI466('2421').click()">Raksha Khaldoun (RAKH)</span></span></div><script>HI466("242511").checked=T</script><div id=“k62” style="left:95px;top:35px;width:32;height:25;"><span id="176" name="146"><span style="cursor:pointer" onmousedown="HI466('2176').click()">Yash Chevalier (YACH)</span>
what I want is to pick up the names such as “Steffen Eddine (PhD) (SEED)” from there.
Obviously they all begins with “
import re
with open ("original_text.txt", "r") as myfile:
data = myfile.read()
aa = re.search(""<span style="cursor:pointer" onmousedown="", data)
How can I pick them out? (I tried to use BeautifulSoup too but not really successful).
user Aaron submitted below. I found it very close to what I need.
However it only returns 5 "span style="cursor:pointer" onmousedown="". what further I need to take on?
for m in re.finditer('<span style="cursor:pointer" onmousedown="',data, re.IGNORECASE | re.MULTILINE):
print m.group(0)