How to regex the strings in an url

Question

http://something.com/bOhxBeD,SyhyTGi,TMDDSIB,U72gx2J,kQTIRy9,7VXgGDw,eSxIcK6,S5oNlnn,WBHHsLk,BdMGd2d,U9kNlsF,cHVyc7Y,D83kaJ5,cLWgdSO,iWtCIF3,ount8L6

I have tried to get the value: bOhxBeD, SyhyTGi and so on. This is what I come up with ( yes fairly simple ) /([a-zA-Z0-9]{7})/, it seems to work with PCRE:

([a-zA-Z0-9]{7})

Regular expression visualization

Debuggex Demo

But when it comes to Ruby, I use it like this :

str.match(/([a-zA-Z0-9]{7})/)
#<MatchData "bOhxBeD" 1:"bOhxBeD">

it doesn't seem to work. Can anyone point out what's wrong with this regex ? Thanks

Not an answer to your actual question, but this may be better suited to methods other than regexes. eg: require 'uri'; URI(str).path[1..-1].split(',') — Tim Peters
– Tim Peters, Commented Aug 27, 2014 at 6:43
@TimPeters this is a good answer too, thanks. But somehow to me when I look at this kind of thing, I think about regex anyway. So I try hard to learn it properly. But still, nice solution there :) — Tzu ng
– Tzu ng, Commented Aug 27, 2014 at 6:45

Avinash Raj · Accepted Answer · 2014-08-27 06:12:26Z

3

You need to add word boundary \b inorder to match an exact 7 alphanumeric characters.

\b[a-zA-Z0-9]{7}\b

DEMO

irb(main):006:0> "http://something.com/bOhxBeD,SyhyTGi,TMDDSIB,U72gx2J,kQTIRy9,7VXgGDw,eSxIcK6,S5oNlnn,WBHHsLk,BdMGd2d,U9kNlsF,cHVyc7Y,D83kaJ5,cLWgdSO,iWtCIF3,ount8L6".scan(/\b([a-zA-Z0-9]{7})\b/)
=> [["bOhxBeD"], ["SyhyTGi"], ["TMDDSIB"], ["U72gx2J"], ["kQTIRy9"], ["7VXgGDw"], ["eSxIcK6"], ["S5oNlnn"], ["WBHHsLk"], ["BdMGd2d"], ["U9kNlsF"], ["cHVyc7Y"], ["D83kaJ5"], ["cLWgdSO"], ["iWtCIF3"], ["ount8L6"]]

edited Aug 27, 2014 at 6:12

answered Aug 27, 2014 at 6:04

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

vks · Accepted Answer · 2014-08-27 05:49:17Z

2

 (?!.*?\/)[a-zA-Z0-9]{7}

Is should be this.Or else it will pick 7 letter words from link as well."somethi" will be in ans.But i guess that is not required.

answered Aug 27, 2014 at 5:49

vks

68.1k11 gold badges96 silver badges132 bronze badges

1 Comment

vks Over a year ago

it uses a negative lookahead i.e ?!.So anything which matches negative lookahead cannot be matched by the main matcher.So negative lookahead says match anything upto last /.So the ans can now come from raminign string only.That is what was required.

Adrian · Accepted Answer · 2014-08-27 08:35:04Z

2

match only picks up the first match.
You can try the global version of match which is scan.
You can use scan to search string not containing specific characters using [^...]:

str.scan(/[^\/\.\,]+/)[3..-1]   
#=> ["bOhxBeD", "SyhyTGi", "TMDDSIB", "U72gx2J", "kQTIRy9", "7VXgGDw", "eSxIcK6", "S5oNlnn", "WBHHsLk", "BdMGd2d", "U9kNlsF", "cHVyc7Y", "D83kaJ5", "cLWgdSO", "iWtCIF3", "ount8L6"]

Update:
If you know that the strings between the comma are always 7 characters, you can use this instead:

   str.scan(/[^\/\.\,]{7}/)[1..-1]

edited Aug 27, 2014 at 8:35

answered Aug 27, 2014 at 6:19

Adrian

4254 silver badges13 bronze badges

Comments

Igor Guzak · Accepted Answer · 2014-08-27 06:09:23Z

1

it happens because your regexp match just one element which contain 7 chars, nothing more, as simple solution could be:

str.match(/\/(.*)\z/)[1].split(',')

edited Aug 27, 2014 at 6:09

answered Aug 27, 2014 at 5:58

Igor Guzak

2,1651 gold badge13 silver badges20 bronze badges

Comments

Cary Swoveland · Accepted Answer · 2014-08-27 07:57:15Z

1

You could use String#[] and String#split:

str[/.*\/(.*)/,1].split(',')
  #=> ["bOhxBeD", "SyhyTGi", "TMDDSIB", "U72gx2J", "kQTIRy9", "7VXgGDw",
  #    "eSxIcK6", "S5oNlnn", "WBHHsLk", "BdMGd2d", "U9kNlsF", "cHVyc7Y",
  #    "D83kaJ5", "cLWgdSO", "iWtCIF3", "ount8L6"]

.*\/ in the regex, "greedy" as it is, will consume characters up to and including the last forward slash in the string. Capture group #1 (.*) sucks up the remainder of the string and, due to the presence of ,1, returns it. split(',') then breaks up the string to give you the desired array.

Another way:

str[str[/.*\//].size..-1].split(',')

edited Aug 27, 2014 at 7:57

answered Aug 27, 2014 at 7:43

Cary Swoveland

111k6 gold badges69 silver badges105 bronze badges

Collectives™ on Stack Overflow

How to regex the strings in an url

5 Answers 5

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related