Best way to parse a URL query string

Question

What is the best way to parse data out of a URL query string (for instance, data appended to the URL by a form) in python? My goal is to accept form data and display it on the same page. I've researched several methods that aren't quite what I'm looking for.

I'm creating a simple web server with the goal of learning about sockets. This web server won't be used for anything but testing purposes.

GET /?1pm=sample&2pm=&3pm=&4pm=&5pm= HTTP/1.1
Host: localhost:50000
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20100101 Firefox/11.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://localhost:50000/?1pm=sample&2pm=&3pm=&4pm=&5pm=

What's wrong with stackoverflow.com/questions/1349367/… or stackoverflow.com/questions/4685217/parse-raw-http-headers. You haven't given us enough info about what other approaches are lacking. Do you have an example header or two? — Steven Rumbalski
– Steven Rumbalski, Commented Apr 11, 2012 at 20:12
Nothing is 'wrong' with either of these posts. Based on the programming experiences I've head in the past, I'm inclined to do something similar like a regex expression in the second link. However, I wanted to ask and see if there is a simpler way to do it since this is my first python program. — egoskeptical
– egoskeptical, Commented Apr 11, 2012 at 20:24
Looks to me like you're talking about URL query strings, not HTTP headers. You might want to update your question to reflect this. — ʇsәɹoɈ
– ʇsәɹoɈ, Commented Apr 11, 2012 at 20:57

jmunsch · Accepted Answer · 2018-11-27 15:24:24Z

126

Here is an example using python3 urllib.parse:

from urllib.parse import urlparse, parse_qs
URL='https://someurl.com/with/query_string?i=main&mode=front&sid=12ab&enc=+Hello'
parsed_url = urlparse(URL)
parse_qs(parsed_url.query)

output:

{'i': ['main'], 'enc': [' Hello '], 'mode': ['front'], 'sid': ['12ab']}

Note for python2: from urlparse import urlparse, parse_qs

SEE: https://pythonhosted.org/six/#module-six.moves.urllib.parse

edited Nov 27, 2018 at 15:24

answered Oct 3, 2016 at 23:24

jmunsch

24.3k12 gold badges102 silver badges120 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Suisse Over a year ago

And why are the values like this ['value'] ? dic['enc'] gets ['Hello'] - how to get 'Hello'? with split?

jmunsch Over a year ago

@Suisse see stackoverflow.com/questions/11447391/… the values are in a list because multiple values can be encoded see : stackoverflow.com/questions/2571145/… hope it helps

Delgan · Accepted Answer · 2016-08-19 20:31:45Z

54

The urllib.parse module is your friend: https://docs.python.org/3/library/urllib.parse.html

Check out urllib.parse.parse_qs (parsing a query-string, i.e. form data sent to server by GET or form data posted by POST, at least for non-multipart data). There's also cgi.FieldStorage for interpreting multipart-data.

For parsing the rest of an HTTP interaction, see RFC2616, which is the HTTP/1.1 protocol specification.

edited Aug 19, 2016 at 20:31

Delgan

20k11 gold badges101 silver badges162 bronze badges

answered Apr 11, 2012 at 20:11

modelnine

1,5098 silver badges11 bronze badges

10 Comments

modelnine Over a year ago

I'm not writing the script for him. He specifically asked how to parse query data, at least that's what I read between the lines, even though those are not actually HTTP headers. But I didn't bother commenting on that.

Marcin Over a year ago

I'm not suggesting that you should write the script for him, but urlparse is only a tiny piece of this puzzle.

modelnine Over a year ago

For the amount of information he gave, that's all there is to say. Specifically, if you're actually referring to HTTP headers: is he using a webserver which actually allows you to get HTTP headers uninterpreted (via some stream)? Is he using WSGI (where HTTP-headers are interpreted by the framework)? Plain-old CGI, where you have to interpret the environment and hope for the best? Whatever.

egoskeptical Over a year ago

urlparse looks like a great resource. The header is pretty simple and I've added it to the original question. As I'm sure you can guess, my initial idea is to parse the get line into an array of strings.

modelnine Over a year ago

Are you trying to write a webserver? Or some form of packet inspection/inspector?

|

ahuigo · Accepted Answer · 2017-10-06 08:05:50Z

35

If you need unique key from query string, use dict() with parse_qsl()

import urllib.parse
urllib.parse.urlparse('https://someurl.com/with/query_string?a=1&b=2&b=3').query
    a=1&b=2&b=3
urllib.parse.parse_qs('a=1&b=2&b=3');
    {'a': ['1'], 'b': ['2','3']}
urllib.parse.parse_qsl('a=1&b=2&b=3')
    [('a', '1'), ('b', '2'), ('b', '3')]
dict(urllib.parse.parse_qsl('a=1&b=2&b=3'))
    {'a': '1', 'b': '3'}

answered Oct 6, 2017 at 8:05

ahuigo

3,3713 gold badges28 silver badges48 bronze badges

1 Comment

Kristoffer Bakkejord Over a year ago

It's important to notice that the casting from tuple to dict result don't consider b to have two values, one which gets ignored. Wasn't aware of parse_qsl, good addition.

Cuyler Quint · Accepted Answer · 2019-06-18 20:38:58Z

11

built into python 2.7

>>> from urlparse import parse_qs
>>> parse_qs("search=quint&tags=python")
{'search': ['quint'], 'tags': ['python']}

answered Jun 18, 2019 at 20:38

Cuyler Quint

3163 silver badges7 bronze badges

Comments

ollofx · Accepted Answer · 2021-05-07 11:26:53Z

1

only for one line quick prototyping CGI vars without imports, not the best obviously but could be useful.

agrs = dict(item.split('=') for item in env['QUERY_STRING'].split('&') if item)

answered May 7, 2021 at 11:26

ollofx

675 bronze badges

3 Comments

Daniel Serodio Over a year ago

This will break if any parameter in the query string is URL-encoded. "Manual parsing" of URLs is the source of many security issues.

ollofx Over a year ago

indeed why the warning "only for prototyping" posted it to show case a quick parsing without any import

étale-cohomology Over a year ago

I wonder if every URL parser is a "manual parser"? At some point someone had to sit down and write it...

DoctorPangloss · Accepted Answer · 2024-03-18 20:50:38Z

1

Based on this article, use can_ada to parse URLs in Python:

from their project:

import can_ada
urlstring = "https://www.GOoglé.com/./path/../path2/"
url = can_ada.parse(urlstring)
# prints www.xn--googl-fsa.com, the correctly parsed domain name according
# to WHATWG
print(url.hostname)
# prints /path2/, which is the correctly parsed pathname according to WHATWG
print(url.pathname)

import urllib.parse
urlstring = "https://www.GOoglé.com/./path/../path2/"
url = urllib.parse.urlparse(urlstring)
# prints www.googlé.com
print(url.hostname)
# prints /./path/../path2/
print(url.path)

answered Mar 18, 2024 at 20:50

DoctorPangloss

3,0731 gold badge21 silver badges23 bronze badges

Collectives™ on Stack Overflow

Best way to parse a URL query string

6 Answers 6

2 Comments

10 Comments

1 Comment

Comments

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

10 Comments

1 Comment

Comments

3 Comments

Comments

Linked

Related