1

I am needing to update the query part (page_index=) of a URL. I have tried a couple ways shown below but am hitting a wall. I am new to python and looking for guidance. The page index ranges from 0 - 511 (adds new daily) and I need to update the url to loop through all of the indexes. The index will always start at 0.

import urlparse

url = 'https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews?
start_date=2016-1-01&end_date=2017-8-26&page_index=0&countries=US'
parts = urlparse.urlparse(url)
parts = parts._replace(query = page_index [2])
parts.geturl()

I get the error:

TypeError Traceback (most recent call last)
<ipython-input-29-066332f37bb3> in <module>()
  3 url = 'https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews?start_date=2016-1-01&end_date=2017-8-26&page_index=0&countries=US'
  4 parts = urlparse.urlparse(url)
----> 5 parts = parts._replace(query = page_index [2])
  6 parts.geturl()
  7
TypeError: 'function' object has no attribute '__getitem__'
0

2 Answers 2

1

You have to pull out the query component of the results of urlparse() and modify it, then reconstruct a new URL, as follows:

pr = urlparse.urlparse(url)
parts = pr.query.split('&')
parts[2] = 'page_index=2'
new_url = urlparse.urlunparse([pr.scheme, pr.netloc, pr.path, pr.params, "&".join(parts), pr.fragment])

To iterate this through all of your page numbers, loop over the last two lines for whatever range of page numbers you need.

Sign up to request clarification or add additional context in comments.

5 Comments

for your last comment regarding looping over the last two lines. Would I just use a for statement to accomplish this for i in parts(0,500,1): print(i)
Or just for pg_no in range (500): if your page numbers start at zero, or for pg_no in range(1,500) if they start at 1.
Hopefully you can tell me what I am doing wrong one last time. have been googling for hours. Below is what I have but am getting error when i .join(pg_no) or just pg_no. Any suggestions? Or am I completely off? parts[2] = 'page_index=0' for pg_no in range (500): new_url = urlparse.urlunparse([pr.scheme, pr.netloc, pr.path, pr.params, "&".join(parts), "&".join(pg_no), pr.fragment])
Modify the 3rd line in the answer to be parts[2] = 'page_index=%s' % pg_no
that is where I thought I needed to modify the code. Thank you for all the help. I appreciate it!!
1

The simplest way, just modify the url directly:

base_url = "https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews?start_date=2016-1-01&end_date=2017-8-26&page_index={}&countries=US"

for pi in range(512):
    this_url = base_url.format(pi)
    # now get it

A slightly more complicated, but more easily customized, way - passing the parameters as a dict:

import requests

url = "https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews"
params = {
    "start_date": "2016-1-01",
    "end_date"  : "2017-8-26"
    "countries" : "US"
}

for pi in range(512):
    params["page_index"] = pi
    res = requests.get(url, params)
    if res.ok:
        html = res.text

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.