0

I have a possibly simple task ahead of me, but my RegEx skills are poor. Can anyone help me, or point me in the right direction? :-)

Example text I'm parsing, And I would like to do a foreach on the results where I can get the variable "URL" and the text in between:

Lorem ipsum dolor sit amet, consectetur[URL=/test.aspx?ID=12345]lorem ipsum[/URL] adipiscing elit. Nullam interdum eleifend mauris, nec condimentum nisi lacinia sit amet. Mauris faucibus, orci ac[URL=/Default.aspx?ID=222222]lorem[/URL] convallis volutpat, dolor libero sollicitudin quam, id feugiat magna orci[URL=/Default.aspx?ID=333333]lorem ipsum dolor[/URL] quis augue. Integer nec euismod sem.

5
  • -1 for really bad title. Commented Oct 19, 2011 at 10:27
  • This may be some help: regular-expressions.info/tutorial.html Commented Oct 19, 2011 at 10:27
  • How about using String.IndexOf() API to find the URL value and then from that index you can read upto next URL string is found. Hope your getting the funda? Commented Oct 19, 2011 at 10:28
  • When you feel comfortable enough you can take a look at this gem : shop.oreilly.com/product/9781565922570.do Commented Oct 19, 2011 at 10:43
  • Good suggestions on where to start reading. Commented Oct 19, 2011 at 10:54

3 Answers 3

4

This should do it for you:

Regex theRegex = new Regex(@"\[URL=([^\]]+)\]([^\[]+)\[/URL\]");
string text = "Lorem ipsum dolor sit amet, consectetur[URL=/test.aspx?ID=12345]lorem ipsum[/URL] adipiscing elit. Nullam interdum eleifend mauris, nec condimentum nisi lacinia sit amet. Mauris faucibus, orci ac[URL=/Default.aspx?ID=222222]lorem[/URL] convallis volutpat, dolor libero sollicitudin quam, id feugiat magna orci[URL=/Default.aspx?ID=333333]lorem ipsum dolor[/URL] quis augue. Integer nec euismod sem.";
MatchCollection matches = theRegex.Matches(text);
foreach (Match thisMatch in matches)
{
//        thisMatch.Groups[0].Value is e.g. "[URL=/test.aspx?ID=12345]lorem ipsum[/URL]"
//        thisMatch.Groups[1].Value is e.g. "/test.aspx?ID=12345"
//        thisMatch.Groups[2].Value is e.g. "lorem ipsum"

}
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. This was the way I chose to do it.
0

This sort of thing will work if your text looks exactly like this, i.e. you have no nested URLs, your URL tag is all in capitals

 "\[URL=([^\]]*)\]([^\[]*\)\[/URL\]"

this should capture two groups: 1 = the stuff after URL=, 2 = the stuff between the [URL]...[\URL] marks.

Basically,

  • as [ and ] are reserved tokens, to match them you need to prefix them by backslashes (i.e. "escaping" them)

  • [^\[] matches any character that isn't an open-bracket.

  • the parentheses determine groups that can be captured.

Caveats: nested URL tags won't work, tags that themselves contain square brackets won't work, and quoted strings "..." should also be free from brackets - i.e. they won't be treated like a correct markup parser would.

The only way round this kind of problem, as far as I know, is to do a full parse.

But if you're sure the data doesn't have these sorts of anomaly, you'll be OK!

Comments

0

Here is the requested regex

\[URL=(?<url>[^\]]*)\](?<text>[^\[]*)\[/URL\]

You access the requested values with following code:

   var regex = new Regex(@"\[URL=(?<url>[^\]]*)\](?<text>[^\[]*)\[/URL\]");
   var matches = regex.Matches(textToSearchIn);

   foreach (Match match in matches)
   {
       Debug.Print("Url: {0} Text: {1}", match.Groups["url"].Value, match.Groups["text"].Value);
   }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.