split url by python

Question

I have this URL:

/drive/rayon.productlist.seomenulevel/fh_refpath$003dfacet_1$0026fh_refview$003dlister$0026fh_view_size$003d100$0026fh_reffacet$003dcategories$0026auchan_page_type$003dcatalogue$0026fh_location$003d$00252f$00252f52$00252ffr_FR$00252fdrive_id$00253d993$00252fcategories$00253c$00257b52_3686967$00257d$00252fcategories$00253c$00257b52_3686967_3686326$00257d$00252fcategories$00253c$00257b52_3686967_3686326_3700610$00257d$00252fcategories$00253c$00257b52_3686967_3686326_3700610_3700620$00257d/Capsules$0020$002843$0029/3700620?t:ac=3686967/3700610

i want to have 3 last numbers : item[0] = 3700620, item[1]=3686967 and item[2] = 3700610

i tried this

one =   url.split('/')[-1]
two =   url.split('/')[-2]

the result of the first one is 3700610"

and the second one 3700620?t:ac=3686967

alecxe · Accepted Answer · 2016-04-25 16:29:24Z

4

A non-regex approach would involve using urlparse and a bit of splitting:

>>> import urlparse
>>> parsed_url = urlparse.urlparse(url) 
>>> number1 = parsed_url.path.split("/")[-1]
>>> number2, number3 = urlparse.parse_qs(parsed_url.query)["t:ac"][0].split("/")
>>> number1, number2, number3
('3700620', '3686967', '3700610')

Regex approach:

>>> import re
>>> re.search(r"/(\d+)\?t:ac=(\d+)/(\d+)$", url).groups()
('3700620', '3686967', '3700610')

where (\d+) are saving/capturing groups that match one or more digits, \? would match a literal question mark (we need to escape it since it has a special meaning), $ would match the end of the string.

You can also name the groups and produce a dictionary:

>>> re.search(r"/(?P<number1>\d+)\?t:ac=(?P<number2>\d+)/(?P<number3>\d+)", url).groupdict()
{'number2': '3686967', 'number3': '3700610', 'number1': '3700620'}

edited Apr 25, 2016 at 16:29

answered Apr 25, 2016 at 16:26

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

CodenameLambda Over a year ago

This is why I hate regex. It just looks messy. Although it is easy, if you try to understand it.

Ouss4 · Accepted Answer · 2016-04-25 16:31:39Z

2

Another solution using regex.

import re
re.findall('\d+', url)[-3:]

answered Apr 25, 2016 at 16:31

Ouss4

4794 silver badges11 bronze badges

Comments

Vincent Savard · Accepted Answer · 2016-04-25 16:28:04Z

1

The following two should work.

url.split('/')[-2].split('=')[1]
url.split('/')[-2].split('?')[0]

edited Apr 25, 2016 at 16:28

Vincent Savard

36.1k10 gold badges71 silver badges73 bronze badges

answered Apr 25, 2016 at 16:27

giosans

1,1881 gold badge13 silver badges30 bronze badges

Comments

trans1st0r · Accepted Answer · 2016-04-25 16:28:11Z

1

Try this:

split_list = url.split('/')
third = split_list[-1]
first, second = split_list[-2].split('?t:ac=')

answered Apr 25, 2016 at 16:28

trans1st0r

2,0832 gold badges18 silver badges23 bronze badges

1 Comment

SpoonMeiser Over a year ago

That seems awfully specific to this exact URL

Collectives™ on Stack Overflow

split url by python

4 Answers 4

1 Comment

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related