1

I am trying to extract the text from a html file. The html file looks like this:

<li class="toclevel-1 tocsection-1">
    <a href="#Baden-Württemberg"><span class="tocnumber">1</span>
        <span class="toctext">Baden-Württemberg</span>
    </a>
</li>
<li class="toclevel-1 tocsection-2">
    <a href="#Bayern">
        <span class="tocnumber">2</span>
        <span class="toctext">Bayern</span>
    </a>
</li>
<li class="toclevel-1 tocsection-3">
    <a href="#Berlin">
        <span class="tocnumber">3</span>
        <span class="toctext">Berlin</span>
    </a>
</li>

I want to extract the last text from the last spantag. In the first line it would be "Baden-Würtemberg" after class="toctext"and then put it to a python list.

in Python I tried the following:

names = soup.find_all("span",{"class":"toctext"})

My output the is this list:

[<span class="toctext">Baden-Württemberg</span>, <span class="toctext">Bayern</span>, <span class="toctext">Berlin</span>]

So how can I extract only the text between the tags?

Thanks to all

2 Answers 2

2

The find_all method returns a list. Iterate over the list to get the text.

for name in names:
    print(name.text)

Returns:

Baden-Württemberg
Bayern
Berlin

The builtin python dir() and type() methods are always handy to inspect an object.

print(dir(names))

[...,
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort',
 'source']
Sign up to request clarification or add additional context in comments.

Comments

0

With a list of comprehension you could do the following :

names = soup.find_all("span",{"class":"toctext"})
print([x.text for x in names])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.