Python Script
import requests
import json
from bs4 import BeautifulSoup
import re
url = 'https://www.dunelm.com/product/caldonia-check-natural-eyelet-curtains-1000187301?defaultSkuId=30729125'
r = requests.get(url)
soup = BeautifulSoup(r.content,'html.parser')
# Save source code to file for testing
with open("sourcecode.html", "w", encoding='utf-8') as file:
file.write(str(soup))
# Regex pattern to capture JSON data within webpage source code
regex_pattern = r"{\"delivery\"*.*false*}}}"
I'm trying to pull the JSON data embedded within the source code of the URL listed above using Regex.
I have manually pulled the source code from the URL listed and entered into regex101.com using the following regex pattern:
{\"delivery\"*.*false*}}}
The regex pattern appears to capture the desired JSON data needed.
Issue
When I view the contents of the soup variable or saved file it appears to capture the HTML source code.
However, I do not know how to process regex to only capture the JSON data string needed to build my desired Python Dictionary.
Any help would be greatly appreciated.