Invalid URL when using Python Requests

Question

I am trying to access the API returning program data at this page when you scroll down and new tiles are displayed on the screen. Looking in Chrome Tools I have found the API being called and put together the following Requests script:

import requests

session = requests.session()

url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node?slug=/entertainment/collections/all-entertainment&represent=(items[take=60](items(items[select_list=iceberg])))'

session.headers = {
'Host': 'https://www.nowtv.com',
'Connection': 'keep-alive',
'Accept': 'application/json, text/javascript, */*',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
'Referer': 'https://www.nowtv.com',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
}

scraper = cloudscraper.create_scraper(sess=session)
r = scraper.get(url)

data = r.content
print(data)

session.close()

This is returning the following only:

b'<HTML><HEAD>\n<TITLE>Invalid URL</TITLE>\n</HEAD><BODY>\n<H1>Invalid URL</H1>\nThe requested URL "&#91;no&#32;URL&#93;", is invalid.<p>\nReference&#32;&#35;9&#46;3c0f0317&#46;1608324989&#46;5902cff\n</BODY></HTML>\n'

I assume the issue is the part at the end of the URL that is in curly brackets. I am not sure however how to handle these in a Requests call. Can anyone provide the correct syntax?

Thanks

alecxe · Accepted Answer · 2020-12-18 21:28:10Z

1

The issue is the Host session header value, don't set it.

That should be enough. But I've done some additional things as well:

add the X-* headers:

session.headers.update(**{
    'X-SkyOTT-Proposition': 'NOWTV',
    'X-SkyOTT-Language': 'en',
    'X-SkyOTT-Platform': 'PC',
    'X-SkyOTT-Territory': 'GB',
    'X-SkyOTT-Device': 'COMPUTER'
})

visit the main page without XHR header set and with a broader Accept header value:
```
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 
```

I've also used params for the GET parameters - you don't have to do it, I think. It's just cleaner:

 In [33]: url = 'https://ie.api.atom.nowtv.com/adapter-atlas/v3/query/node'

 In [34]: response = session.get(url, params={
              'slug': '/entertainment/collections/all-entertainment', 
              'represent': '(items[take=60,skip=2340](items(items[select_list=iceberg])))'
          }, headers={
              'Accept': 'application/json, text/plain, */*', 
              'X-Requested-With':'XMLHttpRequest'
          })

 In [35]: response
 Out[35]: <Response [200]>

 In [36]: response.text
 Out[36]: '{"links":{"self":"/adapter-atlas/v3/query/node/e5b0e516-2b84-11e9-b860-83982be1b6a6"},"id":"e5b0e516-2b84-11e9-b860-83982be1b6a6","type":"CATALOGUE/COLLECTION","segmentId":"","segmentName":"default","childTypes":{"next_items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":68},"items":{"nodeTypes":["ASSET/PROGRAMME","CATALOGUE/SERIES"],"count":2376},"curation-config":{"nodeTypes":["CATALOGUE/CURATIONCONFIG"],"count":1}},"attributes":{"childNodeTyp
           ...

edited Dec 18, 2020 at 21:28

answered Dec 18, 2020 at 21:22

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

gdogg371 Over a year ago

hi - thanks, that works for me now. im not really a web developer and/or expert. ive seen params before, but ive never seen them like that in the URL, which is what confused me...

Collectives™ on Stack Overflow

Invalid URL when using Python Requests

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related