0

Objective: find a second pattern and consider it a match only if it is the first time the pattern was seen following a different pattern.

Background:

I am using Python-2.7 Regex

I have a specific Regex match that I am having trouble with. I am trying to get the text between the square brackets in the following sample.

  Sample comments:

    [98 g/m2 Ctrl (No IP) 95 min 340oC         ]

    [    ]

I need the line:

98 g/m2 Ctrl (No IP) 95 min 340oC

The problem is the undetermined number of white-spaces, tabs, and new-lines between the search pattern Sample comments: and the match I want is giving me trouble.

Best Attempt:

I am able to match the first part easily,

match = re.findall(r'Sample comments:[.+\n+]+', string)

But I can't get the match to the length I want to grab the portion between the square brackets,

match = re.findall(r'Sample comments:[.+\n+]+\[(.+)\]', string)

My Thinking:

Is there a way to use ReGex to find the first instance of the pattern \[(.+)\] after a match of the pattern Sample comments:? Or is there a more robust way to find the bit between the square braces in my example case.

Thanks,

Michael

6
  • Not quite clear. Maybe Sample comments:\s*\[(.*?)\s*] will suffice? See ideone.com/FZ5Ee0 Commented Jul 12, 2016 at 18:45
  • Yours works, but I don't understand how. Does \s include white-space AND \n? There is definitely a newline in my sample, but it seems to work anyway. Commented Jul 12, 2016 at 18:47
  • Yes, \s matches any whitespace, vertical and horizontal ones. Commented Jul 12, 2016 at 18:48
  • Man do I feel stupid...thank you. That includes tab-space? Commented Jul 12, 2016 at 18:49
  • Googled it, yes it does. Commented Jul 12, 2016 at 18:50

2 Answers 2

3

I suggest using

r'Sample comments:\s*\[(.*?)\s*]'

See the regex and IDEONE demo

The point is the \s* matches zero or more whitespace, both vertical (linebreaks) and horizontal. See Python re reference:

\s
When the UNICODE flag is not specified, it matches any whitespace character, this is equivalent to the set [ \t\n\r\f\v]. The LOCALE flag has no extra effect on matching of the space. If UNICODE is set, this will match the characters [ \t\n\r\f\v] plus whatever is classified as space in the Unicode character properties database.

Pattern details:

  • Sample comments: - a sequence of literal chars
  • \s* - 0 or more whitespaces
  • \[ - a literal [
  • (.*?) - Group 1 (returned by re.findall) capturing 0+ any chars but a newline as few as possible up to the first...
  • \s* - 0+ whitespaces and
  • ] - a literal ] (note it does not have to be escaped outside the character class).
Sign up to request clarification or add additional context in comments.

2 Comments

When I learned ReGex, I was ambiguously told that \s matches whitespace. I erroneously assumed this mean a space character. As a result, I didn't even think that part of the code was my problem. Thank you!
A space can be matched with a mere [ ] or a just space. If you want to match horizontal whitespace in Python, you actually can just use [ \t] or [^\S\r\n] (these patterns will do in most situations).
0

Not sure if I understand your problem correctly, but re.findall('Sample comments:[^\\[]*\\[([^\\]]*)\\]', string) seems to work.

Or maybe re.findall('Sample comments:[^\\[]*\\[[ \t]*([^\\]]*?)[ \t]*\\]', string) if you want to strip the final spaces from your line?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.