0

E.G. We have this sentence.

Sample 987 abc sample 567 xyz, yellow world sample 123

By using this regex = sample \d+

I would like, by using re.findall() to get values next to the sample which is after word abc, which is sample 567 and sample 123

I know how to find the value I need, the problem is that I need to use it AFTER a specific word and not sure how to.

P.S. This word can be changed from abc to word so the result will be sample 123 and e.t.c....

11
  • Why not simply include the leading word in the regex as well? Commented Mar 10, 2021 at 14:53
  • Would it always be possible to distinquish sample from Sample through the capitalized "S". It looks like that word "abc" will only occur before any sample with a lower "s". Therefor \bsample (\d+) may work? Commented Mar 10, 2021 at 14:54
  • @JvdV this is just example, in simplest terms I need that assigned regex will search ONLY after the selected word (the first one) Commented Mar 10, 2021 at 14:56
  • @rauberdaniel not sure how to do it and achieve what I need. Return all the values from regext after assigned word and ignore all before this word Commented Mar 10, 2021 at 14:57
  • re.findall(r'\bsample\s*(\d+)', s)? ideone.com/LL8wO9 Commented Mar 10, 2021 at 14:59

2 Answers 2

1

The easiest way might be to limit the regex search to a specific area:

pattern = re.compile(r'sample \d+')
start_pos = original_string.index('your_start_word')
matches = pattern.findall(original_string, start_pos)
Sign up to request clarification or add additional context in comments.

Comments

0

Right, it looks like the following may work for you:

\bsample (\d+)(?!.*\babc\b)

This will assure that the word "abc" is not following, therefor it does not capture '987' from your sample.

See the online demo

  • \b - A word-boundary.
  • sample - Match "sample " literally.
  • (\d+) - Capture 1+ digits in a capture group.
  • (?!.*\babc\b) - Negative lookahead to prevent it be in front of the word "abc".

For example:

import re
s = 'sample 987 abc sample 567 xyz, yellow world sample 123'
results = re.findall(r'(?<=\bsample )\d+(?!.*\babc\b)', s)
print(results)

Prints:

['567', '123']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.