1

I'm new to scraping and have a question if I technically can do what I would like to do.

I would like to scrape a website but I see that there is actual more information in the API behind the website that there is on the website itself. As i can see the data of the API in my webbrowser is there a way to scrape this data like scraping the front-end with selenium for example or not?

In the image you can see some of the API data of the site.

Thanks a lot!

Example Image

1 Answer 1

1

There is a lot of work to parse that object, but it can be done. In order to intercept those network calls, you can use the selenium-wire library

from seleniumwire import webdriver  # Import from seleniumwire

driver = webdriver.Chrome(executable_path=r"E:\chromedriver.exe")

driver.get('https://www.delhaize.be/nl-be/shop/Verse-groenten-en-fruit/c/v2FRU?q=:relevance&sort=relevance')

#accept that privacy to enable the analitics
driver.find_element_by_xpath("//button[contains(@data-testid,'cookie-popup-accept')]").click()

# Access requests via the `requests` attribute
for request in driver.requests:
    if request.response:
        if request.url == 'https://api.delhaize.be/':
            print(
                request.url,
                request.body,
                request.response.status_code,
                request.response.headers['Content-Type']
            )

The response will be something like, that will require parsing:

https://api.delhaize.be/ b'{"operationName":"GetCategoryProductSearch","variables":{"lang":"nl","searchQuery":":relevance","sort":"relevance","category":"v2FRU","pageNumber":0,"pageSize":20,"filterFlag":true},"query":"query GetCategoryProductSearch($anonymousCartCookie: String, $lang: String, $searchQuery: String, $pageSize: Int, $pageNumber: Int, $category: String, $sort: String, $filterFlag: Boolean) {\\n  categoryProductSearch(anonymousCartCookie: $anonymousCartCookie, lang: $lang, searchQuery: $searchQuery, pageSize: $pageSize, pageNumber: $pageNumber, category: $category, sort: $sort, filterFlag: $filterFlag) {\\n    products {\\n      ...ProductBlockDetails\\n      __typename\\n    }\\n    breadcrumbs {\\n      ...Breadcrumbs\\n      __typename\\n    }\\n    facets {\\n      ...Facets\\n      __typename\\n    }\\n    sorts {\\n      name\\n      selected\\n      code\\n      __typename\\n    }\\n    pagination {\\n      ...Pagination\\n      __typename\\n    }\\n    currentQuery {\\n      query {\\n        value\\n        __typename\\n      }\\n      __typename\\n    }\\n    categorySearchTree {\\n      categoryDataList {\\n        categoryCode\\n        categoryData {\\n          facetData {\\n            count\\n            name\\n            query {\\n              query {\\n                value\\n                __typename\\n              }\\n              url\\n              __typename\\n            }\\n            selected\\n            __typename\\n          }\\n          subCategories\\n          __typename\\n        }\\n        __typename\\n      }\\n      level\\n      __typename\\n    }\\n    __typename\\n  }\\n}\\n\\nfragment ProductBlockDetails on Product {\\n  available\\n  averageRating\\n  numberOfReviews\\n  manufacturerName\\n  manufacturerSubBrandName\\n  code\\n  freshnessDuration\\n  freshnessDurationTipFormatted\\n  frozen\\n  recyclable\\n  images {\\n    format\\n    imageType\\n    url\\n    __typename\\n  }\\n  maxOrderQuantity\\n  limitedAssortment\\n  name\\n  onlineExclusive\\n  potentialPromotions {\\n    alternativePromotionMessage\\n    code\\n    priceToBurn\\n    promotionType\\n    range\\n    redemptionLevel\\n    toDisplay\\n    description\\n    title\\n    promoBooster\\n    simplePromotionMessage\\n    __typename\\n  }\\n  price {\\n    approximatePriceSymbol\\n    currencySymbol\\n    formattedValue\\n    priceType\\n    supplementaryPriceLabel1\\n    supplementaryPriceLabel2\\n    showStrikethroughPrice\\n    discountedPriceFormatted\\n    unit\\n    unitCode\\n    unitPrice\\n    value\\n    __typename\\n  }\\n  purchasable\\n  productProposedPackaging\\n  productProposedPackaging2\\n  stock {\\n    inStock\\n    inStockBeforeMaxAdvanceOrderingDate\\n    partiallyInStock\\n    availableFromDate\\n    __typename\\n  }\\n  url\\n  previouslyBought\\n  nutriScoreLetter\\n  __typename\\n}\\n\\nfragment Breadcrumbs on SearchBreadcrumb {\\n  facetCode\\n  facetName\\n  facetValueName\\n  facetValueCode\\n  removeQuery {\\n    query {\\n      value\\n      __typename\\n    }\\n    __typename\\n  }\\n  __typename\\n}\\n\\nfragment Facets on Facet {\\n  code\\n  name\\n  category\\n  facetUiType\\n  values {\\n    code\\n    count\\n    name\\n    query {\\n      query {\\n        value\\n        __typename\\n      }\\n      __typename\\n    }\\n    selected\\n    __typename\\n  }\\n  __typename\\n}\\n\\nfragment Pagination on Pagination {\\n  currentPage\\n  totalResults\\n  totalPages\\n  sort\\n  __typename\\n}\\n"}' 200 application/json; charset=utf-8
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.