0

I need help to figure the regex expression

I have

string = "STATE changed from [Fixed] to [Closed], CLOSED DATE added [Fri Jan 14 09:32:19 
MST 2011], NOTES changed from [CLOSED[]<br />] to [TEST CLOSED <br />]"

I need to grab NOTES changed from [CLOSED[]<br />] to [TEST CLOSED <br />] and take values CLOSED[] and TEST CLOSED in two string variables.
So far I got to:

Regex NotesChanged = new Regex(@"NOTES changed from \[(\w*|\W*)\] to \[([\w-|\W-]*)\]");

which matches only if "NOTES changed from" started at the beginning and has no '[]' within '[ ]', but I have "[CLOSED[]]" and also no "
". Any ideas on what to change in regex.

Thanks, Sharma

2
  • Is "<br />" going to be there every time? Commented Jan 14, 2011 at 17:41
  • Yes, but that expression doesn't work with the "<br />", I some how cant get that "<br />" in my thread here Commented Jan 14, 2011 at 17:44

5 Answers 5

1

If "<br />" is going to be there every time, you can use one of my favourite patterns (and it's worth memorizing). The pattern is:

delim[^delim]*delim

The pattern above will match a delimiter, followed by anything except the delimiter as many times as possible, then the delimiter again.

Here is the regular expression I would be tempted to use:

NOTES changed from \[([^<]*)[^\]]*\] to \[([^<]*)[^\]]*\]

In English:

  • Grabs the opening [
  • Capture #1 all characters until the < (assuming the br tag is always there)
  • Reads until the closing ]
  • Repeat for second capture zone
Sign up to request clarification or add additional context in comments.

2 Comments

can you direct me to a good regex tutorial, where we have patterns as u mentioned.
@sharma, To be honest, I don't know any good resources other than www.regular-expressions.info but that website focuses on syntax rather than patterns. The delimiter pattern came from experience.
1

This is kind of wierd...

(\w*|\W*)

That a capturing group of all word characters zero or many times or all non word characters zero or many times

What you wanna do if you have matching braces is to create a pattern which doesn't consume the delimiter.

\[([^\]]+)\]

That will match any occurrence of [with some text in it] where the matched text is the first group in the match.

Since you have the same type of delimiters nested with in the string itself it gets a bit more tricker and you need to use "look-a-head" or some sort of alteration.

((?:[^\[\]]|\[\])*)

This can be future improved, but there's a problem here that can not be solved if you have [[[]]]. You cannot create a recursive regular expression. It is not that flexible. So you need to either hard code a max depth or apply the regular expression several times.

A fairly exhaustive way of doing this would be

\[((?:[^\[\]]*)(?:(?=\[)(?:[^\]]*)\])?([^\]]))\]

2 Comments

Thanks for the idea, I wasn't able to capture the CLOSED[] and TEST CLOSED from it, but was able to match them. But it was good to know about the regex, I am just a starter. Thanks again, I do have the solution now
Then you give a vote to the ones that contributed to your solution. You should also take a closer look at that last example, it's regular expressions so it looks totally whacked but it matches the outer braces and handles one level of nesting. Assuming that the <br/> tag is there might be fine, and since we don't seem to have a formal grammar for this, it's doesn't really matter. But I urge you to think this through. There are holes in that approach.
0

Try adding "\[|\]" to your capture sequence in the bracket group.

Regex NotesChanged = new Regex(@"NOTES changed from \[(\w*|\W*|\[|\])\] to \[([\w-|\W-|\[|\]]*)\]");

Comments

0

I believe you can use balancing group definitions to match the nested brackets. I believe these are .NET specific, at least in that particular implementation flavor. There's an example on that page, which I've adapted to your input here:

class Program {
    static void Main (string[] args) {
        var input = "STATE changed from [Fixed] to [Closed], CLOSED DATE added [Fri Jan 14 09:32:19 MST 2011], NOTES changed from [CLOSED[]] to [TEST CLOSED ]";
        var regex = new Regex(@"NOTES changed from (((?'open'\[)[^\[\]]*)+((?'close-open'\])[^\[\]]*)+)*");

        foreach (var match in regex.Matches(input)) {
            Console.WriteLine(match);
        }
    }
}

This prints NOTES changed from [CLOSED[]] to [TEST CLOSED ] for me. Note that in my adaption I left off the bit of the expression that causes it to fail to match if the square brackets are not properly balanced, in order to reduce my example to the barest minimum that would satisfy your request... the expression is already pretty unpleasantly complex.

EDIT: Just saw your question got edited a bit while I was posting. The parts of the regex I've supplied here that match "anything but [ and ]" should be able to be replaced with capture groups for the substrings you need to extract.

1 Comment

Thanks for the idea, I wasn't able to capture the CLOSED[] and TEST CLOSED from it. But it was good to know about the regex, I am just a starter. Thanks again, I do have the solution now
0

If you have the luxury of fixing the regex with specific keywords or phrases, the following would work:

NOTES changed from (?:(?:\[)?([A-Z]+\[\]))<br />\] to \[([A-Z]+\s+[A-Z]+)

The above would match the string NOTES changed from [CLOSED[]<br />] to [TEST CLOSED and put CLOSED[] and TEST CLOSED into 2 separate groups.

Update

In fact you can make this even shorter (and a bit more non-specific) by using the . specifier:

NOTES changed from (?:(?:\[)?([A-Z]+\[\])).+\[([A-Z]+\s+[A-Z]+)

This means it will match like the above, only instead of being specific about matching the <br /> tags etc in between it will match regardless of what is in between.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.