I am trying to extract the text from a html file.
The html file looks like this:
<li class="toclevel-1 tocsection-1">
<a href="#Baden-Württemberg"><span class="tocnumber">1</span>
<span class="toctext">Baden-Württemberg</span>
</a>
</li>
<li class="toclevel-1 tocsection-2">
<a href="#Bayern">
<span class="tocnumber">2</span>
<span class="toctext">Bayern</span>
</a>
</li>
<li class="toclevel-1 tocsection-3">
<a href="#Berlin">
<span class="tocnumber">3</span>
<span class="toctext">Berlin</span>
</a>
</li>
I want to extract the last text from the last spantag.
In the first line it would be "Baden-Würtemberg" after class="toctext"and then put it to a python list.
in Python I tried the following:
names = soup.find_all("span",{"class":"toctext"})
My output the is this list:
[<span class="toctext">Baden-Württemberg</span>, <span class="toctext">Bayern</span>, <span class="toctext">Berlin</span>]
So how can I extract only the text between the tags?
Thanks to all