0

I'm trying to get a string between one or more pairs of string. For example,

import re
string1 = 'oi sdfdsf a'
string2 = 'biu serdfd e'
pattern = '(oi|biu)(.*?)(a|e)'
substring = re.search(pattern, string1).group(1)

In this case I should get: "sdfdsf" if I use string1 and "serdfd" if I use string2 in the search funnction. Instead I'm getting "oi" or "biu"

2
  • why not just str.split? Commented Sep 23, 2021 at 4:48
  • Your match is in Group 2. Just use .group(2) to get it. Just .strip() it afterwards, no need to complicate the regex. Commented Sep 23, 2021 at 8:02

2 Answers 2

2

If you use string in parentheses, regex will capture your string. If you want capture some strings but not match of them, you should add '(?:)' expressions.

You can just changed your pattern as below.

pattern = '(?:oi|biu)[ /t]+([\w*]+)[ /t]+(?:a|e)'
Sign up to request clarification or add additional context in comments.

1 Comment

That's indeed the simplest solution. You use a non-greedy captrue group. In this case (.*?) will capture any sequence of 0 or more characters until the start of the next capture group.
1

You are placing capture groups around parts of your regex pattern which you don't really want to capture. Consider this version:

inp = ['oi sdfdsf a', 'biu serdfd e']
for i in inp:
    word = re.findall(r'\b(?:oi|biu) (\S+) (?:a|e)\b', i)[0]
    print(i + ' => ' + word)

Here we turn off the capture groups on the surrounding words on the left and right, and instead use a single capture group around the term you want to capture. This prints:

oi sdfdsf a => sdfdsf
biu serdfd e => serdfd

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.