1

I had a requirement of parsing a set of urls and extract specific elements from urls under special conditions. To explain it further, consider a set of urls:

http://www.example.com/appName1/some/extra/parts/keyword/rest/of/the/url http://www.somewebsite.com/appName2/some/extra/parts/keyword/rest/of/the/url http://www.someothersite.com/appname3/rest/of/the/url

As you can see, there are two sets of urls, one having the word "keyword" in it and others which don't. In my code, I will receive the part of the url after domain name (eg: /appName1/some/extra/parts/keyword/rest/of/the/url).

I have two tasks, one check if the word "keyword" is present in the url, and second, to be done only if "keyword" is not present in url, parse the url to fetch the two groups as the appName and rest of the url (eg: grp 1. appName3 and grp 2. rest/of/the/url for url 3, as it doesn't have "keyword" in it). The whole thing should be done in one regex.

My progress:

  • I was able to parse the app name and rest of the url into groups, but was not able to apply the condition.

  • I found out a way to select stings not having "keyword" in it, I'm not sure if it's the right way to do it:^((?!.\*keyword).\*)$

  • Next, to combine the above two, I tried something I found after a long search, which has syntax (?(?=regex)then|else) Reference. And the result was :
    (?(?=^((?!.*keyword).*)$)\1)
    But it says invalid group structure.

I had gone through many stackoverflow entries and tutorials, but couldn't reach the actual requirement. Please help me solve this.

9
  • "I have two tasks" What is second task? Commented Sep 24, 2016 at 17:00
  • group the components based on the result of the first task (filter urls without keyword). Sorry for not being clear, I've edited my question. Commented Sep 24, 2016 at 17:01
  • If "keyword" is in string do nothing? Commented Sep 24, 2016 at 17:03
  • Yes, completely ignore it. Commented Sep 24, 2016 at 17:04
  • Not certain if this is possible using a single RegExp Commented Sep 24, 2016 at 17:22

1 Answer 1

1

Yes, this is in fact possible. As far as I understand, you have the following cases:

  • /appName/some/extra/parts/keyword/rest/of/the/url
  • /appName/rest/of/the/url

You want your regex to not match the first one at all, while in the second case you want "appName" in one group and "rest/of/the/url" in another. The following regex will do that:

^(?!.*\/keyword\/)\/(.*?)\/(.*)$

Explanation:

  • ^ assert position at the start of the string`
  • (?!.*\/keyword\/) is a negative lookahead, and looks ahead to make sure the string does not contain /keyword/. This is where the magic happens
  • \/ matches "/", i.e. the slash right after the domain name
  • (.*?)\/ captures the first group (appname in your example) greedily until next slash
  • (.*)$ is the group that captures "rest/of/the/url"
Sign up to request clarification or add additional context in comments.

4 Comments

Hi @Mathias-S, I tried this, but it seems it returns groups even when there is "keyword" in it. I'm not sure if the requirement was clear. If "keyword" is present in the url, it shouldn't return any groups.
So if keyword is present, you want to get the whole URL, if it's not present, you want the groups? Or do you want nothing at all if keyword is present?
if keyword is present, I don't want anything and if it's not there, the groups
Then you only need to use a negative lookahead. I've updated my answer with a regex that does that. Does this solve your issue?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.