0

I'm trying to isolate a specific link for images from a web page but can't quite get there. The HTML looks something like:

<head>
   <img alt="Generic title" src="https://genericURL/photo/picture.jpg/"> 
   <img src="https://genericurl/.../">
   <img src="https://genericurl/.../">
   ....

I am able to return many links but the link I specifically want is the top one shown, it is the only link containing /photo/picture.jpg. I have tried using the answer from Find specific link text with bs4 and other variations but haven't figured it out yet. Is anyone able to take a look please?

My code:

links = soup.findAll('img', {'src': re.compile('^http://image\d+')})
for link in links:
     print(link.text)

EDIT: Using the suggestions I realised that the link format was changing based on the filter I was using, e.g.: when I was printing the entire web page I saw the link as http://image.... However when I was using findAll('img', {'src' ... the link was outputting as https://img so I was trying to re.compile the wrong things.

1
  • Why not re.compile("photo/picture.jpg")? Commented Mar 11, 2017 at 1:00

2 Answers 2

3
soup.find_all("img", alt="Generic title")

you should use alt as filter.

Sign up to request clarification or add additional context in comments.

Comments

0
import re
links = soup.findAll('img', {'src': re.compile('^http://image\d+')})
for link in links:
    if re.search('photo\/pictures\.jpg', link.get('href', ''), re.IGNORECASE):
        link_i_want = link.get('href')
        break

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.