Regex, return first match after specific word / Python

Question

E.G. We have this sentence.

Sample 987 abc sample 567 xyz, yellow world sample 123

By using this regex = sample \d+

I would like, by using re.findall() to get values next to the sample which is after word abc, which is sample 567 and sample 123

I know how to find the value I need, the problem is that I need to use it AFTER a specific word and not sure how to.

P.S. This word can be changed from abc to word so the result will be sample 123 and e.t.c....

Why not simply include the leading word in the regex as well? — rauberdaniel
– rauberdaniel, Commented Mar 10, 2021 at 14:53
Would it always be possible to distinquish sample from Sample through the capitalized "S". It looks like that word "abc" will only occur before any sample with a lower "s". Therefor \bsample (\d+) may work? — JvdV
– JvdV, Commented Mar 10, 2021 at 14:54
@JvdV this is just example, in simplest terms I need that assigned regex will search ONLY after the selected word (the first one) — Oksana Ok
– Oksana Ok, Commented Mar 10, 2021 at 14:56
@rauberdaniel not sure how to do it and achieve what I need. Return all the values from regext after assigned word and ignore all before this word — Oksana Ok
– Oksana Ok, Commented Mar 10, 2021 at 14:57

rauberdaniel · Accepted Answer · 2021-03-10 15:06:03Z

1

The easiest way might be to limit the regex search to a specific area:

pattern = re.compile(r'sample \d+')
start_pos = original_string.index('your_start_word')
matches = pattern.findall(original_string, start_pos)

answered Mar 10, 2021 at 15:06

rauberdaniel

1,0339 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

JvdV · Accepted Answer · 2021-03-10 15:01:20Z

0

Right, it looks like the following may work for you:

\bsample (\d+)(?!.*\babc\b)

This will assure that the word "abc" is not following, therefor it does not capture '987' from your sample.

See the online demo

\b - A word-boundary.
sample - Match "sample " literally.
(\d+) - Capture 1+ digits in a capture group.
(?!.*\babc\b) - Negative lookahead to prevent it be in front of the word "abc".

For example:

import re
s = 'sample 987 abc sample 567 xyz, yellow world sample 123'
results = re.findall(r'(?<=\bsample )\d+(?!.*\babc\b)', s)
print(results)

Prints:

['567', '123']

answered Mar 10, 2021 at 15:01

JvdV

76.8k8 gold badges48 silver badges89 bronze badges