25

I am having a little bit of regex trouble.

I am trying to get the path in this url videoplay.

http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello

If I use this regex /.+ it matches /video as well.

I would need some kind of anti / negative match to not include //

2
  • 1
    When I have to use regexes on urls fast and dirty, I usually include // at the beginning, before the capture group. Note you can't do http://, because they might be accessing it using a different protocol, or even ://, because they might specify the port number. Commented Aug 19, 2012 at 1:06
  • possible duplicate of Getting parts of a URL (Regex) Commented Jun 4, 2015 at 2:07

13 Answers 13

45

In case if you need this for your JavaScript web-app: the best answer I ever found on this topic is here. Basic (and also original) version of the code looks like this:

var parser = document.createElement('a');
parser.href = "http://example.com:3000/pathname/?search=test#hash";

parser.protocol; // => "http:"
parser.hostname; // => "example.com"
parser.port;     // => "3000"
parser.pathname; // => "/pathname/"
parser.search;   // => "?search=test"
parser.hash;     // => "#hash"
parser.host;     // => "example.com:3000"

Thank you John Long, you made by day!

Sign up to request clarification or add additional context in comments.

Comments

17

(http[s]?:\/\/)?([^\/\s]+\/)(.*) group 3
Demo: http://regex101.com/r/vK4rV7/1

1 Comment

It wouldn't work if there for a path such as www.abc.com?param=xyz. I slightly modified it like this to make it work (I also use non-matching group for the first two groups). (?:https?:\/\/)?(?:[^?\/\s]+[?\/])(.*) Demo: regex101.com/r/eNUBb9
10

This expression gets everything after videoplay, aka the url path.

/\/(videoplay.+)/

This expression gets everything after the port. Also consisting of the path.

/\:\d./(.+)/

However If using Node.js I recommend the native url module.

var url = require('url')
var youtubeUrl = "http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello"
url.parse(youtubeUrl)

Which does all of the regex work for you.

{
  protocol: 'http:',
  slashes: true,
  auth: null,
  host: 'video.google.co.uk:80',
  port: '80',
  hostname: 'video.google.co.uk',
  hash: '#hello',
  search: '?docid=-7246927612831078230&hl=en',
  query: 'docid=-7246927612831078230&hl=en',
  pathname: '/videoplay',
  path: '/videoplay?docid=-7246927612831078230&hl=en',
  href: 'http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello' 
}

1 Comment

The url node module is in legacy mode. The docs recommend using the URL class instead. See here: nodejs.org/dist/latest-v14.x/docs/api/…
6

for new Googlers, use JavaScript web api URL at any environment:

new URL('your url string').pathname

https://developer.mozilla.org/en-US/docs/Web/API/URL/URL

5 Comments

This is beautiful.
Regex URL Path from URL?
He is asking about Regex not existing functions
This is perfect for SSR
"new Googlers"? Don't tell me this is some Google internal doc...
5

function getPath(url, defaults){
    var reUrlPath = /(?:\w+:)?\/\/[^\/]+([^?#]+)/;
    var urlParts = url.match(reUrlPath) || [url, defaults];
    return urlParts.pop();
}
alert( getPath('http://stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('https://stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('//stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url?foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url#foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/?foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/#foo', 'unknown') );
alert( getPath('http://stackoverflow.com/', 'unknown') );

Comments

4

You can try this:

^(?:[^/]*(?:/(?:/[^/]*/?)?)?([^?]+)(?:\??.+)?)$

([^?]+) above is the capturing group which returns your path.

Please note that this is not an all-URL regex. It just solves your problem of matching all the text between the first "/" occurring after "//" and the following "?" character.

If you need an all-matching regex, you can check this StackOverflow link where they have discussed and dissected all possibilities of an URI into its constituent parts including your "path".
If you consider that an overkill AND if you know that your input URL will always follow a pattern of having your path between the first "/" and following "?", then the above regex should be sufficient.

1 Comment

Try this url: video.google.co.uk:80?docid=-7246927612831078230&hl=en#hell…, this regex returns group1 = o
2

Even though the answers using language features are good, here is one more way to split URL to components using REGEXP:

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?
     ||            |  |          |       |   |        | |
     12 - scheme   |  |          |       |   |        | |
                   3  4 - authority, includes hostname/ip and port number.
                                 5 - path|   |        | |
                                         6   7 - query| |
                                                      8 9 - fragment

Comments

2

I have worked on it extensively and here is the result:

(?i)(?<scheme>http|https|ftp|sftp|sip|sips|file):\/\/(?:(?<username>[^`!@#$^&*()+=,:;'"{}\|\[\]\s\/\\]+)(?::(?<password>[^`!@#$^&*()+=,:;'"{}\|\[\]\s\/\\]+))?@)?(?:(?<ipv4>((?:(?:25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|1?\d\d?)))|\[(?<ipv6>(?i)(?:[\da-f]{0,4}:){1,7}(?:(?<ipv4_in_ipv6>(?:(?:25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|1?\d\d?))|[\da-f]{0,4}))\]|(?:(?<sub_domain>[^\s~`!@#$%^&*()_+=,.?:;'"{}\|\[\]\/\\]+\.)*(?<domain>[^\s~`!@#$%^&*()_+=,.?:;'"{}\|\[\]\/\\]+)(?<tld>\.[^\s~`!@#$%^&*()\-_+=,.?:;'"{}\|\[\]\/\\0-9]{2,})))+(?<port>:\d+)?(?:\/(?<path>\/?[^\s`@#$^&=.?"{}\\]+\/)*(?<file>[^\s`@#$^&=?"{}\/\\]+)?(?<query>\?[^\s`#$^"{}\\]+)*(?<fragment>#[^\s`$^&=?"{}\/\\]+)?)?

Demo | Git Repository

So, in your case, there is just a need to get the group contains the path and add the word you like, i.e. videoplay. To be more specific, I am talking about this:

(?:\/videoplay(?<path>\/?[^\s`@#$^&=.?"{}\\]+\/)*(?<file>[^\s`@#$^&=?"{}\/\\]+)?(?<query>\?[^\s`#$^"{}\\]+)*(?<fragment>#[^\s`$^&=?"{}\/\\]+)?)?

Comments

1

You mean a negative lookbehind? (?<!/)

Comments

1

var subject =
'<link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=ec617d715196"><link rel="apple-touch-icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a"><link rel="image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a">';
var re=/\"[a-z]+:\/\/[^ ]+"/m;
document.write(subject.match(re));

You can try this

/\"[a-z]+:\/\/[^ ]+/

Usage

if (/\"[a-z]+:\/\/[^ ]+/m.test(subject)) {  // Successful match } else {    // Match attempt failed }

Comments

0

Its not a regex solution, but most languages have a URL library that will parse any URL into its constituent parts. This may be a better solution for what you are doing.

Comments

-1

Please try this:

^http[s]?:\/\/(www\.)?(.*)?\/?(.)*

Comments

-2

I think this is what you're after: [^/]+$

Demo: http://regex101.com/r/rG8gB9

1 Comment

This doesn't match the path of a URL, just the very last part of the path. With "google.com/foo/bar" it matches "bar"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.