Extract variables from text using RegEx and c#

Question

I have a possibly simple task ahead of me, but my RegEx skills are poor. Can anyone help me, or point me in the right direction? :-)

Example text I'm parsing, And I would like to do a foreach on the results where I can get the variable "URL" and the text in between:

Lorem ipsum dolor sit amet, consectetur[URL=/test.aspx?ID=12345]lorem ipsum[/URL] adipiscing elit. Nullam interdum eleifend mauris, nec condimentum nisi lacinia sit amet. Mauris faucibus, orci ac[URL=/Default.aspx?ID=222222]lorem[/URL] convallis volutpat, dolor libero sollicitudin quam, id feugiat magna orci[URL=/Default.aspx?ID=333333]lorem ipsum dolor[/URL] quis augue. Integer nec euismod sem.

This may be some help: regular-expressions.info/tutorial.html — Purplegoldfish
– Purplegoldfish, Commented Oct 19, 2011 at 10:27
How about using String.IndexOf() API to find the URL value and then from that index you can read upto next URL string is found. Hope your getting the funda? — Zenwalker
– Zenwalker, Commented Oct 19, 2011 at 10:28
When you feel comfortable enough you can take a look at this gem : shop.oreilly.com/product/9781565922570.do — FailedDev
– FailedDev, Commented Oct 19, 2011 at 10:43

Michael Low · Accepted Answer · 2011-10-19 10:38:29Z

4

This should do it for you:

Regex theRegex = new Regex(@"\[URL=([^\]]+)\]([^\[]+)\[/URL\]");
string text = "Lorem ipsum dolor sit amet, consectetur[URL=/test.aspx?ID=12345]lorem ipsum[/URL] adipiscing elit. Nullam interdum eleifend mauris, nec condimentum nisi lacinia sit amet. Mauris faucibus, orci ac[URL=/Default.aspx?ID=222222]lorem[/URL] convallis volutpat, dolor libero sollicitudin quam, id feugiat magna orci[URL=/Default.aspx?ID=333333]lorem ipsum dolor[/URL] quis augue. Integer nec euismod sem.";
MatchCollection matches = theRegex.Matches(text);
foreach (Match thisMatch in matches)
{
//        thisMatch.Groups[0].Value is e.g. "[URL=/test.aspx?ID=12345]lorem ipsum[/URL]"
//        thisMatch.Groups[1].Value is e.g. "/test.aspx?ID=12345"
//        thisMatch.Groups[2].Value is e.g. "lorem ipsum"

}

edited Oct 19, 2011 at 10:38

answered Oct 19, 2011 at 10:31

Michael Low

24.5k17 gold badges85 silver badges119 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Christopher W. Brandsdal Over a year ago

Thanks. This was the way I chose to do it.

Sanjay Manohar · Accepted Answer · 2011-10-19 10:33:56Z

This sort of thing will work if your text looks exactly like this, i.e. you have no nested URLs, your URL tag is all in capitals

 "\[URL=([^\]]*)\]([^\[]*\)\[/URL\]"

this should capture two groups: 1 = the stuff after URL=, 2 = the stuff between the [URL]...[\URL] marks.

Basically,

as [ and ] are reserved tokens, to match them you need to prefix them by backslashes (i.e. "escaping" them)
[^\[] matches any character that isn't an open-bracket.
the parentheses determine groups that can be captured.

Caveats: nested URL tags won't work, tags that themselves contain square brackets won't work, and quoted strings "..." should also be free from brackets - i.e. they won't be treated like a correct markup parser would.

The only way round this kind of problem, as far as I know, is to do a full parse.

But if you're sure the data doesn't have these sorts of anomaly, you'll be OK!

Fischermaen · Accepted Answer · 2011-10-19 10:38:53Z

0

Here is the requested regex

\[URL=(?<url>[^\]]*)\](?<text>[^\[]*)\[/URL\]

You access the requested values with following code:

   var regex = new Regex(@"\[URL=(?<url>[^\]]*)\](?<text>[^\[]*)\[/URL\]");
   var matches = regex.Matches(textToSearchIn);

   foreach (Match match in matches)
   {
       Debug.Print("Url: {0} Text: {1}", match.Groups["url"].Value, match.Groups["text"].Value);
   }

answered Oct 19, 2011 at 10:38

Fischermaen

12.5k2 gold badges43 silver badges56 bronze badges

Collectives™ on Stack Overflow

Extract variables from text using RegEx and c#

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related