So, this seems to work. I get a 200 OK response both times, and the content isn't the same length.
For what it's worth, in Firefox, when I click the blue "Shop this store" button, it takes me to what appears to be the exact same page, but without the blue button I just clicked. In Chrome (Beta), when I click the blue button, I get a 403 Access denied page. Their server isn't playing nice. You might struggle to achieve what you want to achieve.
If I call session.get without my headers, I never get a response at all. So they're obviously checking the user-agent, possibly cookies, etc.
import requests
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Upgrade-Insecure-Requests": "1",}
session = requests.Session()
url = "https://www.lowes.com/store/AK-Anchorage/2955"
response1 = session.get(url, headers=headers)
print(response1, len(response1.content))
response2 = session.get(url, headers=headers)
print(response2, len(response2.content))
Output:
<Response [200]> 56282
<Response [200]> 56323
I've done some more testing. The server times out if you don't change the user-agent from the default Python Requests one. Even changing it to "" seems to be enough for the server to give you a response.
You can get product information, including description, specifications, and price, without selecting a specific store. Take a look at this GET request, with no cookies, and no session:
import requests, json
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"}
url = "https://www.lowes.com/pd/Google-Nest-Learning-Thermostat-3rd-Gen-Thermostat-and-Room-Sensor-with-with-Wi-Fi-Compatibility/1001080012"
r = requests.get(url, headers=headers, timeout=5)
print("return code:", r)
print("content length:", len(r.content))
for line in r.text.splitlines():
if "window.digitalData.products = [" in line:
print("This line includes the 'sellingPrice' and the 'retailPrice'. After some splicing, we can treat it as JSON.")
left = line.find(" = ") + 3
right = line.rfind(";")
print(json.dumps(json.loads(line[left:right]), indent=True))
break
Output:
return code: <Response [200]>
content length: 107134
This line includes the 'sellingPrice' and the 'retailPrice'. After some splicing, we can treat it as JSON.
[
{
"productId": [
"1001080012"
],
"productName": "Nest_Learning_Thermostat_3rd_Gen_Thermostat_and_Room_Sensor_with_with_Wi-Fi_Compatibility",
"ivm": "753160-83910-T3007ES",
"itemNumber": "753160",
"vendorNumber": "83910",
"modelId": "T3007ES",
"type": "ANY",
"brandName": "Google",
"superCategory": "Heating & Cooling",
"quantity": 1,
"sellingPrice": 249,
"retailPrice": 249
}
]
The product description and specification can be found in this element:
<section class="pd-information met-product-information grid-100 grid-parent v-spacing-jumbo">
(It's ~300 lines, so I'm just going to copy the parent tag.)
There's an API that takes a product id and store number, and returns the pricing information:
import requests, json
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0"}
url = "https://www.lowes.com/PricingServices/price/balance?productId=1001080012&storeNumber=1955"
r = requests.get(url, headers=headers, timeout=5)
print("return code:", r)
print("content length:", len(r.content))
print(json.dumps(json.loads(r.text), indent=True))
Output:
return code: <Response [200]>
content length: 768
[
{
"productId": 1001080012,
"storeNumber": 1955,
"isSosVendorDirect": true,
"price": {
"selling": "249.00",
"retail": "249.00",
"typeCode": 1,
"typeIndicator": "Regular Price"
},
"availability": [
{
"availabilityStatus": "Available",
"productStockType": "STK",
"availabileQuantity": 822,
"deliveryMethodId": 1,
"deliveryMethodName": "Parcel Shipping",
"storeNumber": 907
},
{
"availabilityStatus": "Available",
"productStockType": "STK",
"availabileQuantity": 8,
"leadTime": 1570529161540,
"deliveryMethodId": 2,
"deliveryMethodName": "Store Pickup",
"storeNumber": 1955
},
{
"availabilityStatus": "Available",
"productStockType": "STK",
"availabileQuantity": 1,
"leadTime": 1570529161540,
"deliveryMethodId": 3,
"deliveryMethodName": "Truck Delivery",
"storeNumber": 1955
}
],
"@type": "item"
}
]
It can take multiple product numbers. For example:
https://www.lowes.com/PricingServices/price/balance?productId=1001080046%2C1001135076%2C1001091656%2C1001086418%2C1001143824%2C1001094006%2C1000170557%2C1000920864%2C1000338547%2C1000265699%2C1000561915%2C1000745998&storeNumber=1564
You can get information on every store by using this API which returns a 1.6MB json file. maxResults is normally set to 30, and query is your longitude and latitude. I would suggest saving this to disk. I doubt it changes much.
https://www.lowes.com/wcs/resources/store/10151/storelocation/v1_0?maxResults=2000&query=0%2C0
Keep in mind the PricingServices/price/balance endpoint can take multiple values for storeNumber separated by %2C (a comma), so you won't need 1763 separate GET requests. I still made multiple requests using a requests.Session (so it reuses the underlying connection).
akstat.io) before the GET request to the actual page you want. You'll probably want to use arequests.Session. You need to replicate the POST payload by going through the stacktrace and reading the JavaScript. Alternatively, this might be a job for Selenium.