3

I am scraping data from one site, and I need to find one img. I get it but the output is not what I need.

I have tried looking online for solutions, changing code but nothing worked.

r = requests.get(baseurl)
content = r.content
soup = BeautifulSoup(content, "html.parser")

images = soup.findAll('img')[1]
print(images)

Output I get:

<img src="https://cdn.rubyrealms.com/images/WKpivrdGBJJ9p6etIY2aJpixikFj4vnpmpPR9pXjK4Y8K.png" style="border-radius: 5px"/>

Output I need:

cdn.rubyrealms.com/images/WKpivrdGBJJ9p6etIY2aJpixikFj4vnpmpPR9pXjK4Y8K.png

(I tried print(images.text))

2
  • Parse the src attribute from your <img> element Commented Jul 8, 2019 at 22:24
  • 1
    Try images.get('src') Commented Jul 8, 2019 at 22:28

2 Answers 2

4

you can get the img tag's src content using ;

images = soup.findAll('img')[1]
print(images.get("src"))

or

images = soup.findAll('img')[1]
print(images['src'])

Output

https://cdn.rubyrealms.com/images/WKpivrdGBJJ9p6etIY2aJpixikFj4vnpmpPR9pXjK4Y8K.png

The problem with print(images.text) is that it is used to extract the text in between two tags and you want to extract the text which is inside the tag itself.

Hope this helps you :)

Sign up to request clarification or add additional context in comments.

Comments

1

Here's a sample you can adapt:

parser.feed('<img src="python-logo.png" alt="The Python logo">')
Start tag: img
attr: ('src', 'python-logo.png')

REFERENCE: https://docs.python.org/3/library/html.parser.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.