Why does this regex check return true for this string?

Question

I need a regex that will determine if a string is a tweet URL. I've got this

Regexp.new(/http:|https:\/\/(twitter\.com\/.*\/status\/.*|twitter\.com\/.*\/statuses\/.*|www\.twitter\.com\/.*\/status\/.*|www\.twitter\.com\/.*\/statuses\/.*|mobile\.twitter\.com\/.*\/status\/.*|mobile\.twitter\.com\/.*\/statuses\/.*)/i)

Why does it return true for the following?

"https://i.sstatic.net/QdOS0.jpg".match(Regexp.new(/http:|https:\/\/(twitter\.com\/.*\/status\/.*|twitter\.com\/.*\/statuses\/.*|www\.twitter\.com\/.*\/status\/.*|www\.twitter\.com\/.*\/statuses\/.*|mobile\.twitter\.com\/.*\/status\/.*|mobile\.twitter\.com\/.*\/statuses\/.*)/i))? true : false
    => true

If you use Regexp.new('http://', 'i'), you safe yourself some escaping troubles. — giraff
– giraff, Commented Feb 8, 2011 at 9:49
Answer is already given but I just want to leave this site here, it always helps me out greatly when struggling with regexps: Rubular — Maran
– Maran, Commented Feb 8, 2011 at 9:58

Koraktor · Accepted Answer · 2011-02-08 09:52:15Z

4

http: will always match a URL starting with http:

Try the following:

/https?:\/\/(twitter\.com\/.*\/status\/.*|twitter\.com\/.*\/statuses\/.*|www\.twitter\.com\/.*\/status\/.*|www\.twitter\.com\/.*\/statuses\/.*|mobile\.twitter\.com\/.*\/status\/.*|mobile\.twitter\.com\/.*\/statuses\/.*)/i

The question mark will make the s optional, thus matching http or https.

answered Feb 8, 2011 at 9:52

Koraktor

43.2k10 gold badges72 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Toto · Accepted Answer · 2011-02-08 13:46:59Z

2

Your regex could be abbreviated like :

#^https?://(:?www\.|mobile\.)?twitter\.com/.*?/status(:?es)?/.*#i

explanation:

#                       regex delimiter
^                       start of line
https?                  http or https
://                     ://
(:?                     start of non capture group
www\.|mobile\.          www. or mobile.
)?                      end of group
twitter\.com/           twitter.com
.*?                     any number of any char not greedy
/status                 /status
(:?es)?                 non capture group that contains possibly  `es`
/.*                     / followed by any number of any char
$                       end of string
#i                      delimiter and case insensitive

edited Feb 8, 2011 at 13:46

answered Feb 8, 2011 at 13:37

Toto

91.7k63 gold badges97 silver badges135 bronze badges

Comments

Ben · Accepted Answer · 2011-02-08 17:35:34Z

2

No need for regular expressions here (as usual).

require 'uri'
uri = URI.parse("http://www.twitter.com/status/12345")
p uri.host.split('.')[-2] == 'twitter' # returns true

More docs at: http://ruby-doc.org/stdlib/

answered Feb 8, 2011 at 17:35

Ben

6,9952 gold badges29 silver badges23 bronze badges

1 Comment

the Tin Man Over a year ago

+1 for cutting to the chase and sidestepping the regex bullet.

giraff · Accepted Answer · 2011-02-08 10:02:18Z

1

You should group your OR-Clauses, like this:

(http:|https:)

Additionally, it wouldn't hurt to specify beginning and end of it:

^(http:|https:).*$

edited Feb 8, 2011 at 10:02

answered Feb 8, 2011 at 9:54

giraff

4,7092 gold badges25 silver badges37 bronze badges

1 Comment

Phrogz Over a year ago

Or, if you don't need to capture the clause, (?:http:|https:).

a'r · Accepted Answer · 2011-02-08 09:51:37Z

0

The start of your regex specifies an option of just 'http:', which naturally matches the URL you are testing. Depending on how strict you need your check to be, you could just remove the http/https parts from the start of the regex.

answered Feb 8, 2011 at 9:51

a'r

37.2k7 gold badges68 silver badges67 bronze badges

Comments

Phrogz · Accepted Answer · 2011-02-08 14:45:32Z

0

While many other answers show you a better regex, the answer is because /foo|bar/ will match either foo or bar, and what you wrote was /http:|.../, hence all URLs will be matched.

See @giraff's answer for how you could have written the alternation to do what you expect, or @M42's or @Koraktor's answers for a better regexp.

And as posted in the comments, note that you can write a regex literal as %r{...} instead of /.../, which is nice when you want to use / characters in your regex without escaping them.

answered Feb 8, 2011 at 14:45

Phrogz

304k115 gold badges669 silver badges758 bronze badges

1 Comment

giraff Over a year ago

I like the %r-Syntax, but how would you add the i-modifier to it?

Collectives™ on Stack Overflow

Why does this regex check return true for this string?

6 Answers 6

Comments

Comments

1 Comment

1 Comment

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

1 Comment

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related