Find link in specific tags using python, beautiful soup and lambda functions

Question

I have the following html and I am using bs4 beautiful soup in python 3 to extract all hrefs from links that are located inside a specific tag: . It should not be important if there might me more than one or no link nested in the . Furthermore, there would be another step where I filter out links that don't have the "base.html" ending.

<article>
   <a href='link/base.html'>click me!</a>
</article>
...
<article>
   <a href='link2/base.html'>click me!</a>
</article>
...
<article>
   <a href='link3/base.html'>click me!</a>
</article>

This is my code

page = bs4.BeautifulSoup(source, 'html.parser')

articles = page.find_all(name="article")

article_links = map(lambda article: article.a, articles)

article_links = map(lambda tag: tag.get('href'), article_links)

article_links = filter(lambda link: 'base.html' in link, article_links)

article_links = map(lambda link: url + link, article_links)

However, this results in an

AttributeError: 'NoneType' object has no attribute 'get''

at the .get('href') part in line 4. Other variations result in different errors. It needs to be lambda functions. Preferably, I would also like to combine the first two lambda functions into one.

HedgeHog · Accepted Answer · 2022-02-09 20:31:14Z

1

Not sure why to use lambda, so just in case select your targets more specific with css selectors and iterate result set with list comprehension:

[url+a['href'] for a in page.select('article a[href*="base.html"]')]

Example

from bs4 import BeautifulSoup

url = 'http://www.example.com/'
html = '''<article>
   <a href='link/base.html'>click me!</a>
</article>
...
<article>
   <a href='link2/base.html'>click me!</a>
</article>
...
<article>
   <a href='link3/base.html'>click me!</a>
</article>'''

page = BeautifulSoup(html, 'html.parser')

[url+a['href'] for a in page.select('article a[href*="base.html"]')]

Output

['http://www.example.com/link/base.html',
 'http://www.example.com/link2/base.html',
 'http://www.example.com/link3/base.html']

edited Feb 9, 2022 at 20:31

answered Feb 9, 2022 at 20:19

HedgeHog

25.4k5 gold badges18 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Find link in specific tags using python, beautiful soup and lambda functions

1 Answer 1

Example

Output

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Example

Output

Comments

Your Answer

Sign up or log in

Post as a guest

Related