Find first ReGex pattern following a different pattern

Question

Objective: find a second pattern and consider it a match only if it is the first time the pattern was seen following a different pattern.

Background:

I am using Python-2.7 Regex

I have a specific Regex match that I am having trouble with. I am trying to get the text between the square brackets in the following sample.

  Sample comments:

    [98 g/m2 Ctrl (No IP) 95 min 340oC         ]

    [    ]

I need the line:

98 g/m2 Ctrl (No IP) 95 min 340oC

The problem is the undetermined number of white-spaces, tabs, and new-lines between the search pattern Sample comments: and the match I want is giving me trouble.

Best Attempt:

I am able to match the first part easily,

match = re.findall(r'Sample comments:[.+\n+]+', string)

But I can't get the match to the length I want to grab the portion between the square brackets,

match = re.findall(r'Sample comments:[.+\n+]+\[(.+)\]', string)

My Thinking:

Is there a way to use ReGex to find the first instance of the pattern \[(.+)\] after a match of the pattern Sample comments:? Or is there a more robust way to find the bit between the square braces in my example case.

Thanks,

Michael

Not quite clear. Maybe Sample comments:\s*\[(.*?)\s*] will suffice? See ideone.com/FZ5Ee0 — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jul 12, 2016 at 18:45
Yours works, but I don't understand how. Does \s include white-space AND \n? There is definitely a newline in my sample, but it seems to work anyway. — Michael Molter
– Michael Molter, Commented Jul 12, 2016 at 18:47
Yes, \s matches any whitespace, vertical and horizontal ones. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Jul 12, 2016 at 18:48

Wiktor Stribiżew · Accepted Answer · 2016-07-12 18:52:55Z

3

I suggest using

r'Sample comments:\s*\[(.*?)\s*]'

See the regex and IDEONE demo

The point is the \s* matches zero or more whitespace, both vertical (linebreaks) and horizontal. See Python re reference:

\s
When the UNICODE flag is not specified, it matches any whitespace character, this is equivalent to the set [ \t\n\r\f\v]. The LOCALE flag has no extra effect on matching of the space. If UNICODE is set, this will match the characters [ \t\n\r\f\v] plus whatever is classified as space in the Unicode character properties database.

Pattern details:

Sample comments: - a sequence of literal chars
\s* - 0 or more whitespaces
\[ - a literal [
(.*?) - Group 1 (returned by re.findall) capturing 0+ any chars but a newline as few as possible up to the first...
\s* - 0+ whitespaces and
] - a literal ] (note it does not have to be escaped outside the character class).

edited Jul 12, 2016 at 18:52

answered Jul 12, 2016 at 18:49

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Michael Molter Over a year ago

When I learned ReGex, I was ambiguously told that \s matches whitespace. I erroneously assumed this mean a space character. As a result, I didn't even think that part of the code was my problem. Thank you!

Wiktor Stribiżew Over a year ago

A space can be matched with a mere [ ] or a just space. If you want to match horizontal whitespace in Python, you actually can just use [ \t] or [^\S\r\n] (these patterns will do in most situations).

Pierre · Accepted Answer · 2016-07-12 18:49:11Z

0

Not sure if I understand your problem correctly, but re.findall('Sample comments:[^\\[]*\\[([^\\]]*)\\]', string) seems to work.

Or maybe re.findall('Sample comments:[^\\[]*\\[[ \t]*([^\\]]*?)[ \t]*\\]', string) if you want to strip the final spaces from your line?

answered Jul 12, 2016 at 18:49

Pierre

6,3071 gold badge34 silver badges52 bronze badges

Collectives™ on Stack Overflow

Find first ReGex pattern following a different pattern

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related