0

I am trying to access the API returning program data at this page when you scroll down and new tiles are displayed on the screen. Looking in Chrome Tools I have found the API being called and put together the following Requests script:

import requests

session = requests.session()

url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node?slug=/entertainment/collections/all-entertainment&represent=(items[take=60](items(items[select_list=iceberg])))'

session.headers = {
'Host': 'https://www.nowtv.com',
'Connection': 'keep-alive',
'Accept': 'application/json, text/javascript, */*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
'Referer': 'https://www.nowtv.com',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
}

scraper = cloudscraper.create_scraper(sess=session)
r = scraper.get(url)

data = r.content
print(data)

session.close()

This is returning the following only:

b'<HTML><HEAD>\n<TITLE>Invalid URL</TITLE>\n</HEAD><BODY>\n<H1>Invalid URL</H1>\nThe requested URL "&#91;no&#32;URL&#93;", is invalid.<p>\nReference&#32;&#35;9&#46;3c0f0317&#46;1608324989&#46;5902cff\n</BODY></HTML>\n'

I assume the issue is the part at the end of the URL that is in curly brackets. I am not sure however how to handle these in a Requests call. Can anyone provide the correct syntax?

Thanks

1 Answer 1

1

The issue is the Host session header value, don't set it.


That should be enough. But I've done some additional things as well:

  • add the X-* headers:

    session.headers.update(**{
        'X-SkyOTT-Proposition': 'NOWTV',
        'X-SkyOTT-Language': 'en',
        'X-SkyOTT-Platform': 'PC',
        'X-SkyOTT-Territory': 'GB',
        'X-SkyOTT-Device': 'COMPUTER'
    })
    
  • visit the main page without XHR header set and with a broader Accept header value:

    text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 
    
  • I've also used params for the GET parameters - you don't have to do it, I think. It's just cleaner:

     In [33]: url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node'
    
     In [34]: response = session.get(url, params={
                  'slug': '/entertainment/collections/all-entertainment', 
                  'represent': '(items[take=60,skip=2340](items(items[select_list=iceberg])))'
              }, headers={
                  'Accept': 'application/json, text/plain, */*', 
                  'X-Requested-With':'XMLHttpRequest'
              })
    
     In [35]: response
     Out[35]: <Response [200]>
    
     In [36]: response.text
     Out[36]: '{"links":{"self":"/adapter-atlas/v3/query/node/e5b0e516-2b84-11e9-b860-83982be1b6a6"},"id":"e5b0e516-2b84-11e9-b860-83982be1b6a6","type":"CATALOGUE/COLLECTION","segmentId":"","segmentName":"default","childTypes":{"next_items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":68},"items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":2376},"curation-config":{"nodeTypes":["CATALOGUE/CURATIONCONFIG"],"count":1}},"attributes":{"childNodeTyp
               ...
    
Sign up to request clarification or add additional context in comments.

1 Comment

hi - thanks, that works for me now. im not really a web developer and/or expert. ive seen params before, but ive never seen them like that in the URL, which is what confused me...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.