Python Regex Get Text Either side of Specific Characters

Question

I have blocks of text that contain strings like the one below. I need to get the text either side of "rt" and including rt but excluding text/numbers on different lines

Example:

1.99

  Jim Smith rt Tom Ross

Random

So, here the desired result would be "Jim Smith rt Tom Ross".

I am new to regex and cannot get close. I think I need to lookahead and lookbehind then bound the result in some way but I'm struggling.

Any help would be appreciated.

I think I need to lookahead and lookbehind then bound the result Good thought. Request you to please do add your efforts in form of code in your question, to avoid close votes on question, there is nothing right on wrong in posting efforts, cheers. — RavinderSingh13
– RavinderSingh13, Commented Jul 29, 2022 at 9:03

RavinderSingh13 · Accepted Answer · 2022-07-29 08:53:53Z

With your shown samples please try following regex. Here is the Online demo for above regex.

^\d+(?:\.\d+)?\n+\s+(.*?rt[^\n]+)\n+\s*\S+$

Python3 code: Code is written and tested in Python3x. Its using Python3's re module's findall function which also has re.M flag enabled in it to deal with the variable value.

import re
var = """1.99

  Jim Smith rt Tom Ross

Random"""

re.findall(r'^\d+(?:\.\d+)?\n+\s+(.*?rt[^\n]+)\n+\s*\S+$',var,re.M)
['Jim Smith rt Tom Ross']

Explanation of regex:

^\d+          ##From starting of the value matching 1 or more occurrences of digits.
(?:\.\d+)?    ##In a non-capturing group matching literal dot followed by 1 or more digits.
\n+\s+        ##Followed by 1 or more new lines followed by 1 or more spaces.
(.*?rt[^\n]+) ##In a CAPTURING GROUP using lazy match to match till string rt just before a new line.
\n+\s*\S+$    ##Followed by new line(s), followed by 0 or more occurrences of spaces and NON-spaces at the end of this value.

Tim Biegeleisen · Accepted Answer · 2022-07-29 08:45:00Z

1

We can use re.findall here with an appropriate regex pattern:

inp = """1.99

  Jim Smith rt Tom Ross

Random"""

matches = re.findall(r'\w+(?: \w+)* rt \w+(?: \w+)*', inp)
print(matches)  # ['Jim Smith rt Tom Ross']

Explanation of regex:

\w+ match a single word
(?: \w+)* proceeded by space and another word, zero or more times
rt match space followed by 'rt' and another space
\w+ match another word
(?: \w+)* which is followed by space and another word, zero or more times

answered Jul 29, 2022 at 8:45

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

Collectives™ on Stack Overflow

Python Regex Get Text Either side of Specific Characters

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related