1

I have blocks of text that contain strings like the one below. I need to get the text either side of "rt" and including rt but excluding text/numbers on different lines

Example:

1.99

  Jim Smith rt Tom Ross

Random

So, here the desired result would be "Jim Smith rt Tom Ross".

I am new to regex and cannot get close. I think I need to lookahead and lookbehind then bound the result in some way but I'm struggling.

Any help would be appreciated.

1
  • I think I need to lookahead and lookbehind then bound the result Good thought. Request you to please do add your efforts in form of code in your question, to avoid close votes on question, there is nothing right on wrong in posting efforts, cheers. Commented Jul 29, 2022 at 9:03

2 Answers 2

1

With your shown samples please try following regex. Here is the Online demo for above regex.

^\d+(?:\.\d+)?\n+\s+(.*?rt[^\n]+)\n+\s*\S+$

Python3 code: Code is written and tested in Python3x. Its using Python3's re module's findall function which also has re.M flag enabled in it to deal with the variable value.

import re
var = """1.99

  Jim Smith rt Tom Ross

Random"""

re.findall(r'^\d+(?:\.\d+)?\n+\s+(.*?rt[^\n]+)\n+\s*\S+$',var,re.M)
['Jim Smith rt Tom Ross']

Explanation of regex:

^\d+          ##From starting of the value matching 1 or more occurrences of digits.
(?:\.\d+)?    ##In a non-capturing group matching literal dot followed by 1 or more digits.
\n+\s+        ##Followed by 1 or more new lines followed by 1 or more spaces.
(.*?rt[^\n]+) ##In a CAPTURING GROUP using lazy match to match till string rt just before a new line.
\n+\s*\S+$    ##Followed by new line(s), followed by 0 or more occurrences of spaces and NON-spaces at the end of this value.
Sign up to request clarification or add additional context in comments.

Comments

1

We can use re.findall here with an appropriate regex pattern:

inp = """1.99

  Jim Smith rt Tom Ross

Random"""

matches = re.findall(r'\w+(?: \w+)* rt \w+(?: \w+)*', inp)
print(matches)  # ['Jim Smith rt Tom Ross']

Explanation of regex:

  • \w+ match a single word
  • (?: \w+)* proceeded by space and another word, zero or more times
  • rt match space followed by 'rt' and another space
  • \w+ match another word
  • (?: \w+)* which is followed by space and another word, zero or more times

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.