Change query in URL with Python

Question

I am needing to update the query part (page_index=) of a URL. I have tried a couple ways shown below but am hitting a wall. I am new to python and looking for guidance. The page index ranges from 0 - 511 (adds new daily) and I need to update the url to loop through all of the indexes. The index will always start at 0.

import urlparse

url = 'https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews?
start_date=2016-1-01&end_date=2017-8-26&page_index=0&countries=US'
parts = urlparse.urlparse(url)
parts = parts._replace(query = page_index [2])
parts.geturl()

I get the error:

TypeError Traceback (most recent call last)
<ipython-input-29-066332f37bb3> in <module>()
  3 url = 'https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews?start_date=2016-1-01&end_date=2017-8-26&page_index=0&countries=US'
  4 parts = urlparse.urlparse(url)
----> 5 parts = parts._replace(query = page_index [2])
  6 parts.geturl()
  7
TypeError: 'function' object has no attribute '__getitem__'

rd_nielsen · Accepted Answer · 2017-07-09 20:08:12Z

1

You have to pull out the query component of the results of urlparse() and modify it, then reconstruct a new URL, as follows:

pr = urlparse.urlparse(url)
parts = pr.query.split('&')
parts[2] = 'page_index=2'
new_url = urlparse.urlunparse([pr.scheme, pr.netloc, pr.path, pr.params, "&".join(parts), pr.fragment])

To iterate this through all of your page numbers, loop over the last two lines for whatever range of page numbers you need.

answered Jul 9, 2017 at 20:08

rd_nielsen

2,4592 gold badges14 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Daniel Strong Over a year ago

for your last comment regarding looping over the last two lines. Would I just use a for statement to accomplish this for i in parts(0,500,1): print(i)

rd_nielsen Over a year ago

Or just for pg_no in range (500): if your page numbers start at zero, or for pg_no in range(1,500) if they start at 1.

Daniel Strong Over a year ago

Hopefully you can tell me what I am doing wrong one last time. have been googling for hours. Below is what I have but am getting error when i .join(pg_no) or just pg_no. Any suggestions? Or am I completely off? parts[2] = 'page_index=0' for pg_no in range (500): new_url = urlparse.urlunparse([pr.scheme, pr.netloc, pr.path, pr.params, "&".join(parts), "&".join(pg_no), pr.fragment])

rd_nielsen Over a year ago

Modify the 3rd line in the answer to be parts[2] = 'page_index=%s' % pg_no

Daniel Strong Over a year ago

that is where I thought I needed to modify the code. Thank you for all the help. I appreciate it!!

Hugh Bothwell · Accepted Answer · 2017-07-09 20:06:24Z

1

The simplest way, just modify the url directly:

base_url = "https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews?start_date=2016-1-01&end_date=2017-8-26&page_index={}&countries=US"

for pi in range(512):
    this_url = base_url.format(pi)
    # now get it

A slightly more complicated, but more easily customized, way - passing the parameters as a dict:

import requests

url = "https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews"
params = {
    "start_date": "2016-1-01",
    "end_date"  : "2017-8-26"
    "countries" : "US"
}

for pi in range(512):
    params["page_index"] = pi
    res = requests.get(url, params)
    if res.ok:
        html = res.text

edited Jul 9, 2017 at 20:06

answered Jul 9, 2017 at 19:57

Hugh Bothwell

57k9 gold badges91 silver badges103 bronze badges

Collectives™ on Stack Overflow

Change query in URL with Python

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related