Find substring of url using regular expression

Question

I would like suggestions on extracting a substring from a range of URLs. The code I'm writing should extract this piece of info (the actual id of the URL) from URLs in incoming events from our web tracker.

Take these URLs (the URLs that contain the substrings I'm looking for is in the format of the first three)

https://www.rbnett.no/sport/i/LA8gxP/_
https://www.itromso.no/sport/sprek/i/GGobq6/derfor-vraker-tromsoes-beste-loeper-sesongens-eneste-konkurranse-det-er-for-risikabelt-aa-delta
https://www.adressa.no/sport/fotball/i/9vyQGW/brann-treneren-ferdig-avsluttet-pressekonferansen-med-aa-sitere-max-manus
https://www.rbnett.no/dakapo/banner/
https://www.adressa.no/search/

where I want to extract the substrings "LA8gxP", "GGobq6" and "9vyQGW" from the three former URLs respectively, without hitting "dakapo", "banner" or "search" from the latter two.

I'm asking for suggestions on a regexp to extract that piece of info. As far as I know, the substrings only contain a-z, A-Z, and 0-9. The substrings seem to be only 6 chars long, but that will probably change over time.

The best solution (using Python) I have found so far is this:

match = re.search(r"/i/([a-zA-Z0-9]+)/", url)
substring = match.group(1)

It works, but I don't find it to be very elegant.

Also, it's relying on having the /i/-pattern as a prefix. Even though it looks like a consistent pattern, I'm not 100% sure if it is.

If you can't even express the defining rules for the substring in plain words (because you aren't sure about some things) there is no way to express it as regular expression. — Michael Butscher
– Michael Butscher, Commented Aug 5, 2020 at 11:52
I'm aware of the uncertainty, but still want to check if anyone has a suggestion. — Jørgen Frøland
– Jørgen Frøland, Commented Aug 5, 2020 at 11:55
I would probably go for the /i/ attempt. I don't really see any other syntax which seems as reliable as /i/ — sp4c38
– sp4c38, Commented Aug 5, 2020 at 12:00

ISHAN JAISWAL · Accepted Answer · 2020-08-05 16:08:38Z

1

The only other alternative I can think of is: \/i\/(.+)\/

Here is the demo: https://regex101.com/r/2iOyCE/1

answered Aug 5, 2020 at 16:08

ISHAN JAISWAL

661 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Find substring of url using regular expression

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related