2

Basically i need to parse all src="" links from all <script> tags in HTML.

<script src="path/to/example.js" type="text/javascript"></script>

Unfortunately, bs4 cannot do that. Any ideas how can i achieve this?

6
  • 1
    Of course bs4 can do it, which part of it are you lost on? Finding all script tags or extracting the src attribute? Commented May 28, 2019 at 12:45
  • On extracting the src. I've used next code python test = soup.find_all('script') links = [link['src'] for link in test] and recieved an error python KeyError: 'src' This example worked fine with other tags Commented May 28, 2019 at 12:58
  • Meaning there is at least one script tag without a src attribute, probably has some inline javascript. Try [link['src'] for link in test if 'src' in link] Commented May 28, 2019 at 13:05
  • Error no more, but it returns an empty list Commented May 28, 2019 at 13:08
  • We'll need a look at the HTML. Can you provide a link? Commented May 28, 2019 at 13:09

2 Answers 2

4
import requests
import bs4
text = requests.get('http://example.com').text
soup = bs4.BeautifulSoup(text, features='html.parser')
scripts = soup.find_all('script')
srcs = [link['src'] for link in scripts if 'src' in link.attrs]
print(srcs)
Sign up to request clarification or add additional context in comments.

Comments

1

I would condense and use script[src] to ensure script has src attribute

import requests
from bs4 import BeautifulSoup as bs
r = requests.get('http://example.com').content
soup = bs(r, 'lxml') # 'html.parser' if lxml not installed
srcs = [item['src'] for item in soup.select('script[src]')]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.