0

I would like suggestions on extracting a substring from a range of URLs. The code I'm writing should extract this piece of info (the actual id of the URL) from URLs in incoming events from our web tracker.

Take these URLs (the URLs that contain the substrings I'm looking for is in the format of the first three)

https://www.rbnett.no/sport/i/LA8gxP/_
https://www.itromso.no/sport/sprek/i/GGobq6/derfor-vraker-tromsoes-beste-loeper-sesongens-eneste-konkurranse-det-er-for-risikabelt-aa-delta
https://www.adressa.no/sport/fotball/i/9vyQGW/brann-treneren-ferdig-avsluttet-pressekonferansen-med-aa-sitere-max-manus
https://www.rbnett.no/dakapo/banner/
https://www.adressa.no/search/

where I want to extract the substrings "LA8gxP", "GGobq6" and "9vyQGW" from the three former URLs respectively, without hitting "dakapo", "banner" or "search" from the latter two.

I'm asking for suggestions on a regexp to extract that piece of info. As far as I know, the substrings only contain a-z, A-Z, and 0-9. The substrings seem to be only 6 chars long, but that will probably change over time.

The best solution (using Python) I have found so far is this:

match = re.search(r"/i/([a-zA-Z0-9]+)/", url)
substring = match.group(1)

It works, but I don't find it to be very elegant.

Also, it's relying on having the /i/-pattern as a prefix. Even though it looks like a consistent pattern, I'm not 100% sure if it is.

3
  • 1
    If you can't even express the defining rules for the substring in plain words (because you aren't sure about some things) there is no way to express it as regular expression. Commented Aug 5, 2020 at 11:52
  • I'm aware of the uncertainty, but still want to check if anyone has a suggestion. Commented Aug 5, 2020 at 11:55
  • 2
    I would probably go for the /i/ attempt. I don't really see any other syntax which seems as reliable as /i/ Commented Aug 5, 2020 at 12:00

1 Answer 1

1

The only other alternative I can think of is: \/i\/(.+)\/

Here is the demo: https://regex101.com/r/2iOyCE/1

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.