0

Python Script

import requests
import json
from bs4 import BeautifulSoup
import re

url = 'https://www.dunelm.com/product/caldonia-check-natural-eyelet-curtains-1000187301?defaultSkuId=30729125'

r = requests.get(url)
soup = BeautifulSoup(r.content,'html.parser')

# Save source code to file for testing
with open("sourcecode.html", "w", encoding='utf-8') as file:
    file.write(str(soup))

# Regex pattern to capture JSON data within webpage source code
regex_pattern = r"{\"delivery\"*.*false*}}}"

URL: https://www.dunelm.com/product/caldonia-check-natural-eyelet-curtains-1000187301?defaultSkuId=30729125

I'm trying to pull the JSON data embedded within the source code of the URL listed above using Regex.

I have manually pulled the source code from the URL listed and entered into regex101.com using the following regex pattern:

{\"delivery\"*.*false*}}}

The regex pattern appears to capture the desired JSON data needed.

Issue

When I view the contents of the soup variable or saved file it appears to capture the HTML source code.
However, I do not know how to process regex to only capture the JSON data string needed to build my desired Python Dictionary.

Any help would be greatly appreciated.

1
  • have you try this: for content in soup.find_all(re.compile("__your_re_patter")): print(content) Commented Aug 30, 2021 at 12:55

1 Answer 1

1

Maybe something like this can help you:

import re

url = 'https://www.dunelm.com/product/caldonia-check-natural-eyelet-curtains-1000187301?defaultSkuId=30729125'

r = requests.get(url)
source_text = r.text
# Regex for extract info
json = re.findall('put your regex here', source_text)

To convert the returned list to json you can use:

import json
json_format = json.dumps(json)
Sign up to request clarification or add additional context in comments.

1 Comment

Excellent ... The above pulls out the JSON portion from the source code as a LIST type. How do I convert the list to a Dictionary. In the past I would use something like data = json.loads(json), but his throws an error.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.