2

I am using bs4 to scrape a website I have this piece of expression: links = ['https://example.com/' + link['href'] for link in school.findAll('a')]

What I need is to add another condition that if link has href only than append it with links. Here is the normal code.

if link.has_attr('href'):
    //append'

I have tried this but found no success

links = ['https://example.com/' + link['href'] if link.has_attr('href') for link in school.findAll('a')]

2
  • ['https://example.com/' + link['href'] for link in school.findAll('a') if link.has_attr('href')] Commented Aug 2, 2019 at 6:22
  • @Sraw Done! Thanks :) Commented Aug 2, 2019 at 6:30

2 Answers 2

1

There is two possible way:

1.

This will return all link tag if the tag has href attribute.

findAll('a',href=True)

2.

see list comprehension

['https://example.com/' + link['href'] for link in school.findAll('a') if link.has_attr('href')] 

If you are using bs4, better to use find_all() method instead findAll()

Sign up to request clarification or add additional context in comments.

Comments

1

There is another way which is to only return a tags with href via css selector:

links = ['https://example.com/' + link['href'] for link in school.select('a[href]')]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.