32

I'm trying to extract values from a string which are between << and >>. But they could happen multiple times.

Can anyone help with the regular expression to match these;

this is a test for <<bob>> who like <<books>>
test 2 <<frank>> likes nothing
test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>.

I then want to foreach the GroupCollection to get all the values.

Any help greatly received. Thanks.

4 Answers 4

61

Use a positive look ahead and look behind assertion to match the angle brackets, use .*? to match the shortest possible sequence of characters between those brackets. Find all values by iterating the MatchCollection returned by the Matches() method.

var regex = new Regex("(?<=<<).*?(?=>>)");

foreach (var match in regex.Matches(
    "this is a test for <<bob>> who like <<books>>"))
{
    Console.WriteLine(match.Value);
}

LiveDemo in DotNetFiddle

Sign up to request clarification or add additional context in comments.

1 Comment

no pun intended but this is exactly what I'm after. Thank you for you really quick responses.
4

While Peter's answer is a good example of using lookarounds for left and right hand context checking, I'd like to also add a LINQ (lambda) way to access matches/groups and show the use of simple numeric capturing groups that come handy when you want to extract only a part of the pattern:

using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;

// ...

var results = Regex.Matches(s, @"<<(.*?)>>", RegexOptions.Singleline)
            .Cast<Match>()
            .Select(x => x.Groups[1].Value);

Same approach with Peter's compiled regex where the whole match value is accessed via Match.Value:

var results = regex.Matches(s).Cast<Match>().Select(x => x.Value);

Note:

  • <<(.*?)>> is a regex matching <<, then capturing any 0 or more chars as few as possible (due to the non-greedy *? quantifier) into Group 1 and then matching >>
  • RegexOptions.Singleline makes . match newline (LF) chars, too (it does not match them by default)
  • Cast<Match>() casts the match collection to a IEnumerable<Match> that you may further access using a lambda
  • Select(x => x.Groups[1].Value) only returns the Group 1 value from the current x match object
  • Note you may further create a list of array of obtained values by adding .ToList() or .ToArray() after Select.

In the demo C# code, string.Join(", ", results) generates a comma-separated string of the Group 1 values:

var strs = new List<string> { "this is a test for <<bob>> who like <<books>>",
                              "test 2 <<frank>> likes nothing",
                              "test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>." };
foreach (var s in strs) 
{
    var results = Regex.Matches(s, @"<<(.*?)>>", RegexOptions.Singleline)
            .Cast<Match>()
            .Select(x => x.Groups[1].Value);
    Console.WriteLine(string.Join(", ", results));
}

Output:

bob, books
frank
what, on, earth, this, is, too, much

Comments

3

You can try one of these:

(?<=<<)[^>]+(?=>>)
(?<=<<)\w+(?=>>)

However you will have to iterate the returned MatchCollection.

Comments

0

Something like this:

(<<(?<element>[^>]*)>>)*

This program might be useful:

http://sourceforge.net/projects/regulator/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.