Regex URL Path from URL

Question

I am having a little bit of regex trouble.

I am trying to get the path in this url videoplay.

http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello

If I use this regex /.+ it matches /video as well.

I would need some kind of anti / negative match to not include //

When I have to use regexes on urls fast and dirty, I usually include // at the beginning, before the capture group. Note you can't do http://, because they might be accessing it using a different protocol, or even ://, because they might specify the port number. — jwrush
– jwrush, Commented Aug 19, 2012 at 1:06

Vlad Mysla · Accepted Answer · 2015-07-03 05:22:14Z

45

In case if you need this for your JavaScript web-app: the best answer I ever found on this topic is here. Basic (and also original) version of the code looks like this:

var parser = document.createElement('a');
parser.href = "http://example.com:3000/pathname/?search=test#hash";

parser.protocol; // => "http:"
parser.hostname; // => "example.com"
parser.port;     // => "3000"
parser.pathname; // => "/pathname/"
parser.search;   // => "?search=test"
parser.hash;     // => "#hash"
parser.host;     // => "example.com:3000"

Thank you John Long, you made by day!

answered Jul 3, 2015 at 5:22

Vlad Mysla

1,19112 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

M G · Accepted Answer · 2014-08-19 08:54:01Z

17

(http[s]?:\/\/)?([^\/\s]+\/)(.*) group 3
Demo: http://regex101.com/r/vK4rV7/1

answered Aug 19, 2014 at 8:54

M G

1,25014 silver badges28 bronze badges

1 Comment

nbeuchat Over a year ago

It wouldn't work if there for a path such as www.abc.com?param=xyz. I slightly modified it like this to make it work (I also use non-matching group for the first two groups). (?:https?:\/\/)?(?:[^?\/\s]+[?\/])(.*) Demo: regex101.com/r/eNUBb9

ThomasReggi · Accepted Answer · 2015-08-25 11:06:21Z

10

This expression gets everything after videoplay, aka the url path.

/\/(videoplay.+)/

This expression gets everything after the port. Also consisting of the path.

/\:\d./(.+)/

However If using Node.js I recommend the native url module.

var url = require('url')
var youtubeUrl = "http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello"
url.parse(youtubeUrl)

Which does all of the regex work for you.

{
  protocol: 'http:',
  slashes: true,
  auth: null,
  host: 'video.google.co.uk:80',
  port: '80',
  hostname: 'video.google.co.uk',
  hash: '#hello',
  search: '?docid=-7246927612831078230&hl=en',
  query: 'docid=-7246927612831078230&hl=en',
  pathname: '/videoplay',
  path: '/videoplay?docid=-7246927612831078230&hl=en',
  href: 'http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#hello' 
}

answered Aug 25, 2015 at 11:06

ThomasReggi

60.4k97 gold badges262 silver badges464 bronze badges

1 Comment

darksinge Over a year ago

The url node module is in legacy mode. The docs recommend using the URL class instead. See here: nodejs.org/dist/latest-v14.x/docs/api/…

Mohammad Javad Khademian · Accepted Answer · 2021-08-11 09:06:31Z

6

for new Googlers, use JavaScript web api URL at any environment:

new URL('your url string').pathname

https://developer.mozilla.org/en-US/docs/Web/API/URL/URL

answered Aug 11, 2021 at 9:06

Mohammad Javad Khademian

5881 gold badge6 silver badges19 bronze badges

5 Comments

suchislife Over a year ago

This is beautiful.

Max Barrass Over a year ago

Regex URL Path from URL?

Alin Over a year ago

He is asking about Regex not existing functions

Sel Over a year ago

This is perfect for SSR

xuhdev Over a year ago

"new Googlers"? Don't tell me this is some Google internal doc...

2 revs, 2 users 97% · Accepted Answer · 2020-01-23 13:42:58Z

function getPath(url, defaults){
    var reUrlPath = /(?:\w+:)?\/\/[^\/]+([^?#]+)/;
    var urlParts = url.match(reUrlPath) || [url, defaults];
    return urlParts.pop();
}
alert( getPath('http://stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('https://stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('//stackoverflow.com/q/123/regex-url', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url?foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url#foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/?foo', 'unknown') );
alert( getPath('http://stackoverflow.com/q/123/regex-url/#foo', 'unknown') );
alert( getPath('http://stackoverflow.com/', 'unknown') );

Community · Accepted Answer · 2017-05-23 11:55:04Z

4

You can try this:

^(?:[^/]*(?:/(?:/[^/]*/?)?)?([^?]+)(?:\??.+)?)$

([^?]+) above is the capturing group which returns your path.

Please note that this is not an all-URL regex. It just solves your problem of matching all the text between the first "/" occurring after "//" and the following "?" character.

If you need an all-matching regex, you can check this StackOverflow link where they have discussed and dissected all possibilities of an URI into its constituent parts including your "path".
If you consider that an overkill AND if you know that your input URL will always follow a pattern of having your path between the first "/" and following "?", then the above regex should be sufficient.

edited May 23, 2017 at 11:55

CommunityBot

11 silver badge

answered Aug 19, 2012 at 3:08

Kash

9,0894 gold badges31 silver badges48 bronze badges

1 Comment

FiftiN Over a year ago

Try this url: video.google.co.uk:80?docid=-7246927612831078230&hl=en#hell…, this regex returns group1 = o

Nolequen · Accepted Answer · 2021-11-10 08:01:19Z

2

Even though the answers using language features are good, here is one more way to split URL to components using REGEXP:

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?
     ||            |  |          |       |   |        | |
     12 - scheme   |  |          |       |   |        | |
                   3  4 - authority, includes hostname/ip and port number.
                                 5 - path|   |        | |
                                         6   7 - query| |
                                                      8 9 - fragment

answered Nov 10, 2021 at 8:01

Nolequen

4,7078 gold badges49 silver badges66 bronze badges

Comments

Alin · Accepted Answer · 2022-03-14 15:33:33Z

I have worked on it extensively and here is the result:

(?i)(?<scheme>http|https|ftp|sftp|sip|sips|file):\/\/(?:(?<username>[^`!@#$^&*()+=,:;'"{}\|\[\]\s\/\\]+)(?::(?<password>[^`!@#$^&*()+=,:;'"{}\|\[\]\s\/\\]+))?@)?(?:(?<ipv4>((?:(?:25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|1?\d\d?)))|\[(?<ipv6>(?i)(?:[\da-f]{0,4}:){1,7}(?:(?<ipv4_in_ipv6>(?:(?:25[0-5]|2[0-4]\d|1?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|1?\d\d?))|[\da-f]{0,4}))\]|(?:(?<sub_domain>[^\s~`!@#$%^&*()_+=,.?:;'"{}\|\[\]\/\\]+\.)*(?<domain>[^\s~`!@#$%^&*()_+=,.?:;'"{}\|\[\]\/\\]+)(?<tld>\.[^\s~`!@#$%^&*()\-_+=,.?:;'"{}\|\[\]\/\\0-9]{2,})))+(?<port>:\d+)?(?:\/(?<path>\/?[^\s`@#$^&=.?"{}\\]+\/)*(?<file>[^\s`@#$^&=?"{}\/\\]+)?(?<query>\?[^\s`#$^"{}\\]+)*(?<fragment>#[^\s`$^&=?"{}\/\\]+)?)?

Demo | Git Repository

So, in your case, there is just a need to get the group contains the path and add the word you like, i.e. videoplay. To be more specific, I am talking about this:

(?:\/videoplay(?<path>\/?[^\s`@#$^&=.?"{}\\]+\/)*(?<file>[^\s`@#$^&=?"{}\/\\]+)?(?<query>\?[^\s`#$^"{}\\]+)*(?<fragment>#[^\s`$^&=?"{}\/\\]+)?)?

Niet the Dark Absol · Accepted Answer · 2012-08-19 01:06:04Z

1

You mean a negative lookbehind? (?<!/)

answered Aug 19, 2012 at 1:06

Niet the Dark Absol

326k86 gold badges480 silver badges604 bronze badges

Comments

Peter · Accepted Answer · 2020-05-05 18:55:55Z

1

var subject =
'<link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=ec617d715196"><link rel="apple-touch-icon" href="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a"><link rel="image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a">';
var re=/\"[a-z]+:\/\/[^ ]+"/m;
document.write(subject.match(re));

You can try this

/\"[a-z]+:\/\/[^ ]+/

Usage

if (/\"[a-z]+:\/\/[^ ]+/m.test(subject)) {  // Successful match } else {    // Match attempt failed }

edited May 5, 2020 at 18:55

answered May 5, 2020 at 13:16

Peter

1,31517 silver badges17 bronze badges

Comments

Toby Allen · Accepted Answer · 2012-08-19 18:33:52Z

0

Its not a regex solution, but most languages have a URL library that will parse any URL into its constituent parts. This may be a better solution for what you are doing.

answered Aug 19, 2012 at 18:33

Toby Allen

11.3k12 gold badges80 silver badges132 bronze badges

Comments

Mohammad Hosseini Parto · Accepted Answer · 2021-11-10 07:33:50Z

-1

Please try this:

^http[s]?:\/\/(www\.)?(.*)?\/?(.)*

answered Nov 10, 2021 at 7:33

Mohammad Hosseini Parto

8176 silver badges11 bronze badges

Comments

Firas Dib · Accepted Answer · 2012-08-19 11:29:57Z

-2

I think this is what you're after: [^/]+$

Demo: http://regex101.com/r/rG8gB9

answered Aug 19, 2012 at 11:29

Firas Dib

2,6211 gold badge21 silver badges40 bronze badges

1 Comment

justderb Over a year ago

This doesn't match the path of a URL, just the very last part of the path. With "google.com/foo/bar" it matches "bar"

Collectives™ on Stack Overflow

Regex URL Path from URL

13 Answers 13

Comments

1 Comment

1 Comment

5 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

Comments

1 Comment

1 Comment

5 Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related