0

How do I properly construct urls with query strings?

For example, from a website, I scrape the value www.abc.com/SomethingHere?x=1&y=2 however, the value I get uplon scraping is www.abc.com/SomethingHere?x=1&y=2 sometimes there's wierd %xx at the end I don't understand. Requests made with these modified strings fail (but are ok if I manually remove the amp and percentage wierdness). It also makes me afraid of adding more query parameters with just www.abc.com/SomethingHere?x=1&y=2&z=3

How do I make sure I get the proper urls?

2 Answers 2

2

Do it in two steps:

import urllib

# first parse the url
>>> parsed = urllib.parse.urlparse('www.abc.com/SomethingHere?x=1&y=2')
>>> parsed
ParseResult(scheme='', netloc='', path='www.abc.com/SomethingHere', params='', query='x=1&y=2', fragment='')

# the parse the query string component (into a dictionary)
>>> q = parsed.query
>>> urllib.parse.parse_qs(q)
{'y': ['2'], 'x': ['1']}
Sign up to request clarification or add additional context in comments.

Comments

0

You can have a look at urlparse in python (here). Calling urlparse on your query, we get something like:

urlparse('www.abc.com/SomethingHere?x=1&y=2&z=3')
Output: ParseResult(scheme='', netloc='', path='www.abc.com/SomethingHere', params='', query='x=1&y=2&z=3%%xx', fragment='')

For modifying query params you can further use urljoin, as follows:

urljoin('www.abc.com/SomethingHere?x=1&y=2&z=3%%xx', '?x=2')
Output: 'www.abc.com/SomethingHere?x=2'

1 Comment

Thank you. How to get rid of amp; and other idiosyncracies sytematically (i.e, other than str.replace)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.