1

I wanna check if a url string match keyword, for example, keyword is google.com, if url string is google.com or https://google.com, then return true, if url is google.com/search or something like that, return true, if url is google.com.id, then return false as it's a different url, I tried one as below but it doesn't work, how to write regular expression? thank u

regexp.MatchString(`^(?:https?://)?([-a-z0-9]+)(?:\.`+keyword+`)*$`, urlstr)

btw, as far as I understood, regular expression will cause some performance issue, anyone can provide other solutions to handle it?

1 Answer 1

3

You can use

regexp.MatchString(`^(?:https?://)?(?:[^/.\s]+\.)*` + regexp.QuoteMeta(keyword) + `(?:/[^/\s]+)*/?$`)

See the regex demo.

Details:

  • ^ - start of string
  • (?:https?://)? - an optional http:// or https://
  • (?:[^/\s]+\.)* - zero or more repetitions of
    • [^/.\s]+ - one or more chars other than /, . and whitespace
    • \. - a dot
  • google\.com - an escaped keyword
  • (?:/[^/\s]+)* - zero or more repetitions of a / and then one or more chars other than / and whitespace chars
  • /? - an optional /
  • $ - end of string

Note you need to use regexp.QuoteMeta to escape any special chars in the keyword, like a . that matches any char but line break chars by default.

Sign up to request clarification or add additional context in comments.

5 Comments

thank u, seems works, if there will be some performance issue on regular expression? any other solutions to handle it?
@Frank I think this regex is efficient enough not to cause any performance problems. Note I assumed there will be no spaces in the URL. If you need to support whitespace, remove all \ss.
understood, thank u, let me do benchmark for it
hi got a bug, if keyword is https://google.com which contains https, but urlstr is google.com which not contains https, result will be false, but expected should be true. how to change above expression?
@Frank This is a problem with your data/task definition. If you need to handle URLs with protocol or without in keywords, make sure you only check the parts without protocol, simply remove it with strings.Replace(strings.Replace(urlstr, "https://", "", 1), "http://", "", 1) before checking.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.