1

I have this URL:

/drive/rayon.productlist.seomenulevel/fh_refpath$003dfacet_1$0026fh_refview$003dlister$0026fh_view_size$003d100$0026fh_reffacet$003dcategories$0026auchan_page_type$003dcatalogue$0026fh_location$003d$00252f$00252f52$00252ffr_FR$00252fdrive_id$00253d993$00252fcategories$00253c$00257b52_3686967$00257d$00252fcategories$00253c$00257b52_3686967_3686326$00257d$00252fcategories$00253c$00257b52_3686967_3686326_3700610$00257d$00252fcategories$00253c$00257b52_3686967_3686326_3700610_3700620$00257d/Capsules$0020$002843$0029/3700620?t:ac=3686967/3700610

i want to have 3 last numbers : item[0] = 3700620, item[1]=3686967 and item[2] = 3700610

i tried this

one =   url.split('/')[-1]
two =   url.split('/')[-2]

the result of the first one is 3700610"

and the second one 3700620?t:ac=3686967

4 Answers 4

4

A non-regex approach would involve using urlparse and a bit of splitting:

>>> import urlparse
>>> parsed_url = urlparse.urlparse(url) 
>>> number1 = parsed_url.path.split("/")[-1]
>>> number2, number3 = urlparse.parse_qs(parsed_url.query)["t:ac"][0].split("/")
>>> number1, number2, number3
('3700620', '3686967', '3700610')

Regex approach:

>>> import re
>>> re.search(r"/(\d+)\?t:ac=(\d+)/(\d+)$", url).groups()
('3700620', '3686967', '3700610')

where (\d+) are saving/capturing groups that match one or more digits, \? would match a literal question mark (we need to escape it since it has a special meaning), $ would match the end of the string.

You can also name the groups and produce a dictionary:

>>> re.search(r"/(?P<number1>\d+)\?t:ac=(?P<number2>\d+)/(?P<number3>\d+)", url).groupdict()
{'number2': '3686967', 'number3': '3700610', 'number1': '3700620'}
Sign up to request clarification or add additional context in comments.

1 Comment

This is why I hate regex. It just looks messy. Although it is easy, if you try to understand it.
2

Another solution using regex.

import re
re.findall('\d+', url)[-3:]

Comments

1

The following two should work.

url.split('/')[-2].split('=')[1]
url.split('/')[-2].split('?')[0]

Comments

1

Try this:

split_list = url.split('/')
third = split_list[-1]
first, second = split_list[-2].split('?t:ac=')

1 Comment

That seems awfully specific to this exact URL

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.