0

How can I use the json module to extract the price from provides the data in JSON format in an inline script?

I tried to extract the price in https://glomark.lk/top-crust-bread/p/13676 But I couldn't to get the price value.

So please help me to solve this.

import requests
import json

import sys
sys.path.insert(0,'bs4.zip')
from bs4 import BeautifulSoup

user_agent = {
                 'User-agent': 'Mozilla/5.0 Chrome/35.0.1916.47'
                 }
headers = user_agent

url = 'https://glomark.lk/top-crust-bread/p/13676'
req = requests.get(url, headers = headers)
soup = BeautifulSoup(req.content, 'html.parser')

products = soup.find_all("div", class_ = "details col-12 col-sm-12 
col-md-6 col-lg-5 col-xl-5")
for product in products:
    product_name = product.h1.text
    product_price = product.find(id = 'product-promotion-price').text
    print(product_name)
    print(product_price)
3
  • 2
    Can you post some code? Commented Mar 28, 2022 at 16:27
  • 1
    Please provide enough code so others can better understand or reproduce the problem. Commented Mar 28, 2022 at 16:43
  • you may have the most common problem: page may use JavaScript to add/update elements but BeautifulSoup/lxml, requests/urllib can't run JS. You may need Selenium to control real web browser which can run JS. OR use (manually) DevTools in Firefox/Chrome (tab Network) to see if JavaScript reads data from some URL. And try to use this URL with requests. JS usually gets JSON which can be easy converted to Python dictionary (without BS). You can also check if page has (free) API for programmers. Commented Mar 28, 2022 at 17:13

2 Answers 2

1

You can grab json data(price) from hidden api using only requests module. But the product name is not dynamic.

import requests
headers= {
    'content-type': 'application/json',
    'x-requested-with': 'XMLHttpRequest'
   }

api_url = "https://glomark.lk/product-page/variation-detail/13676"


jsonData = requests.post(api_url,  headers=headers).json()

price=jsonData['price']
print(price)

Output:

95

Full working code:

from bs4 import BeautifulSoup
import requests
headers= {
    'content-type': 'application/json',
    'x-requested-with': 'XMLHttpRequest'
   }

api_url = "https://glomark.lk/product-page/variation-detail/13676"


jsonData = requests.post(api_url,  headers=headers).json()

price=jsonData['price']



#to grab product name(not dynamic)

url = 'https://glomark.lk/top-crust-bread/p/13676'
req = requests.get(url)
soup = BeautifulSoup(req.content, 'html.parser')

title=soup.select_one('.product-title h1').text
print(title)
print(price)


 

Output:

Top Crust Bread
95
     
Sign up to request clarification or add additional context in comments.

Comments

1

As mentioned content is provided dynamically by JavaScript so one of the approaches could be to grab the data directly from the script tag, what you already figured out in your question.

data = json.loads(soup.select_one('[type="application/ld+json"]').text)

will give you a dict with product information:

{'@context': 'https://schema.org', '@type': 'Product', 'productID': '13676', 'name': 'Top Crust Bread', 'description': 'Top Crust Bread', 'url': '/top-crust-bread/p/13676', 'image': 'https://objectstorage.ap-mumbai-1.oraclecloud.com/n/softlogicbicloud/b/cdn/o/products/350001--01--1555692328.jpeg', 'brand': 'GLOMARK', 'offers': [{'@type': 'Offer', 'price': '95', 'priceCurrency': 'LKR', 'itemCondition': 'https://schema.org/NewCondition', 'availability': 'https://schema.org/InStock'}]}

simply pick information is needed like price:

data['offers'][0]['price']

Example

import requests, json
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://glomark.lk/top-crust-bread/p/13676'
response = requests.get(url)
soup = BeautifulSoup(response.content)

data = json.loads(soup.select_one('[type="application/ld+json"]').text)

product_price = data['offers'][0]['price']
product_name = data['name']
product_image = data['image']

print(product_name)
print(product_price)
print(product_image)

Output

Top Crust Bread 
95 
https://objectstorage.ap-mumbai-1.oraclecloud.com/n/softlogicbicloud/b/cdn/o/products/350001--01--1555692328.jpeg

1 Comment

Just to be able to assess it better - You first accepted the answer and then rejected it again in next moment - Because you wanted to accept both, which unfortunately is not possible or because the answer actually does not fit?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.