Getting some elements in a string using a regex

Question

Context

Using Ruby I am parsing strings looking like this:

A type with an ID...

[Image=4b5da003ee133e8368000002]
[Video=679hfpam9v56dh800khfdd32]

...with between 0 and n additional options separated with @...

[Image=4b5da003ee133e8368000002@size:small]
[Image=4b5da003ee133e8368000002@size:small@media:true]

In this example:

[Image=4b5da003ee133e8368000002@size:small@media:true]

I want to retrieve:

[Image=4b5da003ee133e8368000002@size:small@media:true]
Image
4b5da003ee133e8368000002
size:small
media:true

Problem

Right now using this regex:

(\[([a-zA-Z]+)=([a-zA-Z0-9]+)(@[a-zA-Z]+:[a-zA-Z]+)*\])

I get...

[Image=4b5da003ee133e8368000002@size:small@media:true]
Image
4b5da003ee133e8368000002
@media:true

What am I doing wrong? How can I get what I want?

PS: All the results are copied from http://rubular.com/ which is nice to debug regex. Please use it if it can help you help me :)

Edit : if it's impossible to get all options separated, how could I get this:

[Image=4b5da003ee133e8368000002@size:small@media:true]
Image
4b5da003ee133e8368000002
@size:small@media:true

Your edited question is actually pretty easy. Since you've already matched up to the first @, you then just need to take everything after that. (\[([a-zA-Z]+)=([a-zA-Z0-9]+)((?:[^\]])*)\]) seems to give the desired effect. — Michael Myers
– Michael Myers ♦, Commented Feb 9, 2010 at 16:07
@mmyers: yes an answer is correct. I'd prefer if someone could give me a solution without the edit though, but if it is impossible... — marcgg
– marcgg, Commented Feb 9, 2010 at 16:09

Lucero · Accepted Answer · 2010-02-09 16:10:14Z

3

Edit:

Ruby's Regex implementation seems not to support multiple captures on one group, as most other regex engines do. Therefore, you'll have to do two steps; first getting all the @*:* in one string and then split those.

To get all of them, this should work:

(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((?:@[a-zA-Z]+:[a-zA-Z]+)*)\])

edited Feb 9, 2010 at 16:10

answered Feb 9, 2010 at 15:58

Lucero

60.4k9 gold badges127 silver badges154 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Matthew Flaschen Over a year ago

Can you elaborate on how to get both values for the 4th group?

marcgg Over a year ago

@matthew: If it's not possible to get what I wanted in the first place then @lucero is right

Alan Moore Over a year ago

@Lucero: if by "multiple captures on one group" you mean something like .NET's GroupCapture construct with its ability to return all intermediate captures, that's actually very rare. AFAIK, only .NET and Perl 6/Parrot provide that capability.

Lucero Over a year ago

@Alan, okay, "most" may be wrong, I had the wrong impression because I used those which support it or didn't need the intermediate captures and therefore didn't notice... thanks for clarifying this.

Greg Bacon · Accepted Answer · 2010-02-09 16:34:39Z

2

To get the "tail" of options, you could fetch it from $4 with

/(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((@[a-zA-Z]+:[a-zA-Z]+)*)\])/

and then split on at-signs.

For example:

#! /usr/bin/ruby

str = "[Image=4b5da003ee133e8368000002@size:small@media:true]"
if /(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((@[a-zA-Z]+:[a-zA-Z]+)*)\])/.match(str)
  print $1, "\n",
        $2, "\n",
        $3, "\n",
        $4, "\n";

  $4[1..-1].split(/@/).each do |s|
    print s, "\n";
  end
end

Output:

[Image=4b5da003ee133e8368000002@size:small@media:true]
Image
4b5da003ee133e8368000002
@size:small@media:true
size:small
media:true

edited Feb 9, 2010 at 16:34

answered Feb 9, 2010 at 16:14

Greg Bacon

141k34 gold badges196 silver badges253 bronze badges

2 Comments

marcgg Over a year ago

Thanks for the answer, but this is not really what I want. This gets me 3:@size:small@media:true and 4:@media:true

Greg Bacon Over a year ago

@marcgg See the program output in my answer.

Matthew Flaschen · Accepted Answer · 2010-02-09 15:58:56Z

1

(\[([a-zA-Z]+)=([a-zA-Z0-9]+)(?:@([a-zA-Z]+:[a-zA-Z]+))*\])

will give you media:true. Note that media:true is overwriting the previous size:small match. I don't think there's a way to get exactly what you want in a single match call.

answered Feb 9, 2010 at 15:58

Matthew Flaschen

286k53 gold badges523 silver badges554 bronze badges

1 Comment

marcgg Over a year ago

Thanks for the answer. I do need both options in a single call. I edited my question to reflect that

Chris Hulan · Accepted Answer · 2010-02-09 16:42:58Z

1

It looks like the regex only keeps the last match. I think to get the list of matches will require a different approach.

"a=b@c:d@e:f".split(/=|@/)

which creates a list:

["a", "b", "c:d", "e:f"]

which is close to what you want...

answered Feb 9, 2010 at 16:42

Chris Hulan

1711 silver badge5 bronze badges

1 Comment

glenn mcdonald Over a year ago

This is what I was going to suggest, too. So much easier this way.

tadman · Accepted Answer · 2010-02-09 18:16:57Z

Although it can be tricky to do it purely within a regexp, it's not too hard to split it out as a two-step operation:

while (line = DATA.gets)
  line.chomp!

  if (m = line.match(/\[([a-zA-Z]+)=([a-zA-Z0-9]+)((?:@[a-zA-Z]+:[a-zA-Z]+)*)\]/))
    (type, hash, options) = m.to_a[1, 3]
    options = options.split(/@/).reject { |s| s.empty? }
    puts [ type, hash, options.join(',') ].join(' / ')
  end
end

__END__
[Image=4b5da003ee133e8368000002]
[Video=679hfpam9v56dh800khfdd32]
[Image=4b5da003ee133e8368000002@size:small]
[Image=4b5da003ee133e8368000002@size:small@media:true]
[Image=4b5da003ee133e8368000002@size:small@media:true@foo:bar]

This produces the output:

Image / 4b5da003ee133e8368000002 / 
Video / 679hfpam9v56dh800khfdd32 / 
Image / 4b5da003ee133e8368000002 / size:small
Image / 4b5da003ee133e8368000002 / size:small,media:true
Image / 4b5da003ee133e8368000002 / size:small,media:true,foo:bar

Collectives™ on Stack Overflow

Getting some elements in a string using a regex

Context

Problem

5 Answers 5

4 Comments

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Context

Problem

5 Answers 5

4 Comments

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related