3
text = 'http://www.site.info www.escola.ninja.br google.com.ag'

expression: (http:\/\/)?((www\.)?\w+\.\w{2,}(\.\w{2,})?)

In Javascript, this expression works, returning:

["http://www.site.info", "www.escola.ninja.br", "google.com.ag"]

Why it's not working in ruby?

For example:

  1. using the Match method:

    p text.match(/(http:\/\/)?(www\.)?\w+\.\w{2,}(\.\w{2})?/)
    #<MatchData "http://www.site.info" 1:"http://" 2:"www." 3:nil>
    
  2. using the Scan method:

    p text.scan(/(http:\/\/)?(www\.)?\w+\.\w{2,}(\.\w{2})?/)
    [["http://", "www.", nil], [nil, "www.", ".br"], [nil, nil, ".ag"]]
    

How can I return the following array instead?

["http://www.site.info", "www.escola.ninja.br", "google.com.ag"]
3
  • 1
    Because they are different languages with different functions for matching regular expressions... Commented Dec 20, 2017 at 21:25
  • 1
    Is there a reason why you're not just splitting on spaces – text.split(' ') – since that's what you're effectively doing with your regex? Or even just a simpler regex, like text.split(/\.?\s+/)? Commented Dec 20, 2017 at 21:33
  • Ruby and ECMAScript are two completely different languages that have nothing to do with each other. You simply cannot expect that you can just copy&paste code back and forth between two completely different programming languages. That is just unreasonable. Commented Dec 21, 2017 at 10:21

2 Answers 2

4

Because according to the Ruby String#scan method:

If the pattern contains groups, each individual result is itself an array containing one entry per group.

So you can simply modify the expression so that the groups are non-capturing by converting (...) to (?:...), resulting in the following expression

text.scan(/(?:http:\/\/)?(?:(?:www\.)?\w+\.\w{2,}(?:\.\w{2,})?)/)
# => ["http://www.site.info", "www.escola.ninja.br", "google.com.ag"]
Sign up to request clarification or add additional context in comments.

Comments

3

The reason is that str.match(/regex/g) in JS does not keep captured substrings, see MDN String#match() reference:

If the regular expression includes the g flag, the method returns an Array containing all matched substrings rather than match objects. Captured groups are not returned.

In Ruby, you have to modify the pattern to remove redundant capturing groups and turn capturing ones into non-capturing (that is, replace unescaped ( with (?:) because otherwise, only the captured substrings will get output by the String#scan method:

If the pattern contains no groups, each individual result consists of the matched string, $&. If the pattern contains groups, each individual result is itself an array containing one entry per group.

Use

text = 'http://www.site.info www.escola.ninja.br google.com.ag'
puts text.scan(/(?:http:\/\/)?(?:www\.)?\w+\.\w{2,}(?:\.\w{2,})?/)

Output of the demo:

http://www.site.info
www.escola.ninja.br
google.com.ag

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.