escaping query string with special characters with python

Question

I got some pretty messy urls that i got via scraping here, problem is that they contain spaces or other special characters in the path and query string, here is some example

http://www.example.com/some path/to the/file.html
http://www.example.com/some path/?file=path to/file name.png&name=name.me

so, is there an easy and robust way to escape the urls so that i can pass them to urlopen? i tried urlib.quote, but it seems to escape the '?', '&', and '=' in the query string as well, and it seems to escape the protocol as well, currently, what i am trying to do is use regex to separate the protocol, path name, and query string and escape them separately, but there are cases where they arent separated properly any advice is appreciated

If the only problem is spaces, what's wrong with url_str.replace(' ', '%20')? — Danica
– Danica, Commented Jun 17, 2012 at 3:10
Dougal, there maybe a possibility of other characters that need to be encoded as well, i'll edit my question soon, — hndr
– hndr, Commented Jun 17, 2012 at 3:14

Mark Reed · Accepted Answer · 2012-06-17 03:10:28Z

5

urllib.quote will quote everything except / by default. You can pass it a list of characters to leave alone as the second argument:

urllib.quote('http://www.example.com/some path/?file=path to/file name.png&name=name.me',
             '/:?&=')
'http://www.example.com/some%20path/?file=path%20to/file%20name.png&name=name.me'

But this is pretty tricky stuff to be messing with semimanually.

answered Jun 17, 2012 at 3:10

Mark Reed

96k17 gold badges149 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

escaping query string with special characters with python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related