1

Is there a way to write a regex,in one line,able to catch only specific part of url like this? :

ftp://trial.com:50/papers/history.pdf

getting only ftp, trial.com and 50.

market://find/tools/new

getting only market and find

1
  • 1
    How do you have a regexp with no language and no tools? Commented Jan 22, 2016 at 18:25

3 Answers 3

1

Try this regex:

\/\/|\/.*|(\w+)

Regex live here.

Explaining:

            # match without grouping what you do not want
\/\/        # two slashes
|           # OR
\/.*        # everything after the first alone-slash
|           # OR
            # now match grouping what you want
(\w+)       # each desired word in group 1

Hope it helps

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks it works, but it catches also ://. There is a way to avoid it?
getting only ftp, trial.com and 50. - I guess OP means as different groups.
@user1938193. What language/tool are you using?
@WashingtonGuedes it seems like he used "regex101.com" ;)
0

I think the question is how to extract a part of the matching string, not how to match the whole string. Some tools allow use of parentheses marks (which must be escaped) for this purpose. Consider this example with sed:

 echo ftp://trial.com/hist.pdf | sed 's/^\(.\+\):\/\/\([^\/]\+\)\/\?.*$/\1 \2/'

The sed command is s/regexp/replacement/ so it matches the regexp and replaces it with replacement. This tags the .\+ part within the parentheses which is printed in the output with \1. The part between the second parentheses is what comes after the // and before the next /. This is printed with \2 in the replacement. Using \+ means a non-zero sequence (at least one) instead of * which is zero or more. The parentheses must be escaped to tag the substrings for use in the replacement, otherwise they just mean parentheses characters.

The ^ signifies the beginning of the line. .\+ is at least one character of something. The :\/\/ matches the ://. The [^\/]\+ between the second parentheses is at least one character that is not / followed by \/\? (an optional /). Lastly, the .*$ is everything until the end of the line.

Comments

0
(\w+):\/\/([\w\.]+)(:(\d+))?.*

Or a less restrictive version (be careful):

(.+?):\/\/([^:\/\?]+)(:(\d+))?.*

And the groups:

$1 is the protocol
$2 is the domain
$4 is the port (optional)

Examples and explanations here.

2 Comments

Could you fix the example url? It doesn't work. Thanks
You do not need to escape the dot inside brackets

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.