0

I am trying to get a JSON object but getting an error, I am using BeautifulSoup. I can't remove "window.pageData= " to do that perfectly. Also got error using .replace method to replace "window.pageData=" but couldn't succeed. My Code:

    link = "https://www.daraz.com.bd/catalog/?q=" + "Pudina"
    r = requests.get(link)
    soup = BeautifulSoup(r.text, 'html.parser')
    all_scripts = soup.find_all('script')
    my_script=all_scripts[3]
    jsData = re.search(r'window.pageData=', my_script.text)
    data = json.loads(jsData.group(1))

Here is my Script

<script>window.pageData={
 "mods": {
   "listItems": [
     {
       "name": "Mint leaf Powder (পুদিনা পাতা গুড়া) (১০০গ্রাম)- Pudina Pata Gura",
       "nid": "125018674",
       "productUrl": "//www.daraz.com.bd/products/mint-leaf-powder-pudina-pata-gura-i125018674-s1045213986.html?search=1",
       "image": "https://static-01.daraz.com.bd/p/e742aabbea46336304f2081a29de1139.jpg",
       "originalPrice": "180.00",
       "originalPriceShow": "৳ 180",
       "price": "171",
     }
   ]
 }
}</script>
1
  • You don't need BeautifulSoup here at all. Try this print(json.loads(re.findall(r"window\.pageData=(.*?)</",r.text)[0])) Commented Sep 18, 2020 at 12:11

1 Answer 1

1

I dont know what have you tried in .replace(), but this works for me.

import requests
from bs4 import BeautifulSoup
import re
import json
link = "https://www.daraz.com.bd/catalog/?q=" + "Pudina"
r = requests.get(link)
soup = BeautifulSoup(r.text, 'html.parser')
all_scripts = soup.find_all('script')
my_script=all_scripts[3]
my_script = re.sub('window.pageData=', "",my_script.text)
#my_script=my_script.text.replace("window.pageData=","")
#print(my_script)
data = json.loads(my_script)
print(data)
Sign up to request clarification or add additional context in comments.

8 Comments

raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) got this error.
in loads return _default_decoder.decode(s) also in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end())
I've added the whole code if you want including imports, seems to be working perfectly for me. could you let me know the versions of each library or the python version you are using?
i just now tried with exact your code but got this error, screenshot
edited my code. could you verify if the print(my_script) still gives window.pageData=
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.