How to extract some data from url using Python

Question

I have an url as follows:

https://some_url/vivi/v2/ZUxOZmVrdzJqTURxV20wQ0RvRld6SytEQWNocThwMGVnbFJ4RDQrZzJMeGRBcnhPYnUzV1pRPT0=/BE?category=PASSENGER&make=30&model=124&regmonth=3&regdate=2015-03&body=443,4781&facelift=252&seats=4&bodyHeight=443&bodyLength=443&weight=-1&engine=1394&wheeldrive=196&transmission=400

What I need is to get the string after v2/, thus ZUxOZmVrdzJqTURxV20wQ0RvRld6SytEQWNocThwMGVnbFJ4RDQrZzJMeGRBcnhPYnUzV1pRPT0=

I use furl to extract the parameter value. I do it as follows:

furl(url).args['category'] // gives PASSENGER

But here I do not have the name of the parameter.

How can I do that?

Split the string and take the element by index

Srinivas Reddy Thatiparthy
– Srinivas Reddy Thatiparthy

2017-12-08 10:50:05 +00:00
Commented Dec 8, 2017 at 10:50 — Srinivas Reddy Thatiparthy
– Srinivas Reddy Thatiparthy, Commented Dec 8, 2017 at 10:50

imox · Accepted Answer · 2017-12-08 10:50:42Z

2

If you don't need a generalized solution but for the url you have provided in question. Then you can do the following:

url="https://some_url/vivi/v2/ZUxOZmVrdzJqTURxV20wQ0RvRld6SytEQWNocThwMGVnbFJ4RDQrZzJMeGRBcnhPYnUzV1pRPT0=/BE?category=PASSENGER&make=30&model=124&regmonth=3&regdate=2015-03&body=443,4781&facelift=252&seats=4&bodyHeight=443&bodyLength=443&weight=-1&engine=1394&wheeldrive=196&transmission=400"
answer=url.split('/')[5]

answered Dec 8, 2017 at 10:50

imox

1,56412 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ahmad · Accepted Answer · 2017-12-08 10:51:43Z

0

Use following code:

l=url.split('/')
m=l[l.index('v2')+1]
print(m)

answered Dec 8, 2017 at 10:51

Ahmad

91011 silver badges28 bronze badges

Comments

Tom Wojcik · Accepted Answer · 2017-12-08 11:14:42Z

0

Desired output using re.

import re

url = "https://some_url/vivi/v2/ZUxOZmVrdzJqTURxV20wQ0RvRld6SytEQWNocThwMGVnbFJ4RDQrZzJMeGRBcnhPYnUzV1pRPT0=/BE?category=PASSENGER&make=30&model=124&regmonth=3&regdate=2015-03&body=443,4781&facelift=252&seats=4&bodyHeight=443&bodyLength=443&weight=-1&engine=1394&wheeldrive=196&transmission=400"
re.findall(r'v2/(.*)/', url)

Resulting with ['ZUxOZmVrdzJqTURxV20wQ0RvRld6SytEQWNocThwMGVnbFJ4RDQrZzJMeGRBcnhPYnUzV1pRPT0='].

But it's safer to use split() the way other mentioned, because when api version changes to v3 this re code won't work anymore.

answered Dec 8, 2017 at 11:14

Tom Wojcik

6,2894 gold badges38 silver badges54 bronze badges

Comments

mhawke · Accepted Answer · 2017-12-08 11:19:15Z

The string that you are after is not a query parameter, it is part of the URL path.

In the general case you can use the urllib.parse module to parse the URL into its components, then access the path. Then extract the required part of the path:

import base64
from urllib.parse import urlparse, parse_qs

parsed_url = urlparse(url)
s = parsed_url.path.split('/')[-2]    # second last component of path
>>> s
'ZUxOZmVrdzJqTURxV20wQ0RvRld6SytEQWNocThwMGVnbFJ4RDQrZzJMeGRBcnhPYnUzV1pRPT0='
>>> base64.b64decode(s)
b'eLNfekw2jMDqWm0CDoFWzK+DAchq8p0eglRxD4+g2LxdArxObu3WZQ=='

The keys and values of the query string can also be processed into a dictionary and accessed by key:

params = parse_qs(parsed_url.query)
>>> params
{'category': ['PASSENGER'], 'make': ['30'], 'model': ['124'], 'regmonth': ['3'], 'regdate': ['2015-03'], 'body': ['443,4781'], 'facelift': ['252'], 'seats': ['4'], 'bodyHeight': ['443'], 'bodyLength': ['443'], 'weight': ['-1'], 'engine': ['1394'], 'wheeldrive': ['196'], 'transmission': ['400']}
>>> params['category']
['PASSENGER']

Collectives™ on Stack Overflow

How to extract some data from url using Python

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related