3

Context


Using Ruby I am parsing strings looking like this:

A type with an ID...

[Image=4b5da003ee133e8368000002]
[Video=679hfpam9v56dh800khfdd32]

...with between 0 and n additional options separated with @...

[Image=4b5da003ee133e8368000002@size:small]
[Image=4b5da003ee133e8368000002@size:small@media:true]

In this example:

[Image=4b5da003ee133e8368000002@size:small@media:true]

I want to retrieve:

  1. [Image=4b5da003ee133e8368000002@size:small@media:true]
  2. Image
  3. 4b5da003ee133e8368000002
  4. size:small
  5. media:true

Problem


Right now using this regex:

(\[([a-zA-Z]+)=([a-zA-Z0-9]+)(@[a-zA-Z]+:[a-zA-Z]+)*\])

I get...

  1. [Image=4b5da003ee133e8368000002@size:small@media:true]
  2. Image
  3. 4b5da003ee133e8368000002
  4. @media:true

What am I doing wrong? How can I get what I want?

PS: All the results are copied from http://rubular.com/ which is nice to debug regex. Please use it if it can help you help me :)


Edit : if it's impossible to get all options separated, how could I get this:

  1. [Image=4b5da003ee133e8368000002@size:small@media:true]
  2. Image
  3. 4b5da003ee133e8368000002
  4. @size:small@media:true
2
  • Your edited question is actually pretty easy. Since you've already matched up to the first @, you then just need to take everything after that. (\[([a-zA-Z]+)=([a-zA-Z0-9]+)((?:[^\]])*)\]) seems to give the desired effect. Commented Feb 9, 2010 at 16:07
  • @mmyers: yes an answer is correct. I'd prefer if someone could give me a solution without the edit though, but if it is impossible... Commented Feb 9, 2010 at 16:09

5 Answers 5

3

Edit:

Ruby's Regex implementation seems not to support multiple captures on one group, as most other regex engines do. Therefore, you'll have to do two steps; first getting all the @*:* in one string and then split those.

To get all of them, this should work:

(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((?:@[a-zA-Z]+:[a-zA-Z]+)*)\])
Sign up to request clarification or add additional context in comments.

4 Comments

Can you elaborate on how to get both values for the 4th group?
@matthew: If it's not possible to get what I wanted in the first place then @lucero is right
@Lucero: if by "multiple captures on one group" you mean something like .NET's GroupCapture construct with its ability to return all intermediate captures, that's actually very rare. AFAIK, only .NET and Perl 6/Parrot provide that capability.
@Alan, okay, "most" may be wrong, I had the wrong impression because I used those which support it or didn't need the intermediate captures and therefore didn't notice... thanks for clarifying this.
2

To get the "tail" of options, you could fetch it from $4 with

/(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((@[a-zA-Z]+:[a-zA-Z]+)*)\])/

and then split on at-signs.

For example:

#! /usr/bin/ruby

str = "[Image=4b5da003ee133e8368000002@size:small@media:true]"
if /(\[([a-zA-Z]+)=([a-zA-Z0-9]+)((@[a-zA-Z]+:[a-zA-Z]+)*)\])/.match(str)
  print $1, "\n",
        $2, "\n",
        $3, "\n",
        $4, "\n";

  $4[1..-1].split(/@/).each do |s|
    print s, "\n";
  end
end

Output:

[Image=4b5da003ee133e8368000002@size:small@media:true]
Image
4b5da003ee133e8368000002
@size:small@media:true
size:small
media:true

2 Comments

Thanks for the answer, but this is not really what I want. This gets me 3:@size:small@media:true and 4:@media:true
@marcgg See the program output in my answer.
1
(\[([a-zA-Z]+)=([a-zA-Z0-9]+)(?:@([a-zA-Z]+:[a-zA-Z]+))*\])

will give you media:true. Note that media:true is overwriting the previous size:small match. I don't think there's a way to get exactly what you want in a single match call.

1 Comment

Thanks for the answer. I do need both options in a single call. I edited my question to reflect that
1

It looks like the regex only keeps the last match. I think to get the list of matches will require a different approach.

"a=b@c:d@e:f".split(/=|@/)

which creates a list:

["a", "b", "c:d", "e:f"]

which is close to what you want...

1 Comment

This is what I was going to suggest, too. So much easier this way.
1

Although it can be tricky to do it purely within a regexp, it's not too hard to split it out as a two-step operation:

while (line = DATA.gets)
  line.chomp!

  if (m = line.match(/\[([a-zA-Z]+)=([a-zA-Z0-9]+)((?:@[a-zA-Z]+:[a-zA-Z]+)*)\]/))
    (type, hash, options) = m.to_a[1, 3]
    options = options.split(/@/).reject { |s| s.empty? }
    puts [ type, hash, options.join(',') ].join(' / ')
  end
end

__END__
[Image=4b5da003ee133e8368000002]
[Video=679hfpam9v56dh800khfdd32]
[Image=4b5da003ee133e8368000002@size:small]
[Image=4b5da003ee133e8368000002@size:small@media:true]
[Image=4b5da003ee133e8368000002@size:small@media:true@foo:bar]

This produces the output:

Image / 4b5da003ee133e8368000002 / 
Video / 679hfpam9v56dh800khfdd32 / 
Image / 4b5da003ee133e8368000002 / size:small
Image / 4b5da003ee133e8368000002 / size:small,media:true
Image / 4b5da003ee133e8368000002 / size:small,media:true,foo:bar

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.