0

I have text like:

sometext...one=1290...sometext...two=12985...sometext...three=1233...

How can I find one=1290 and two=12985 but not three or four or five? There are can be from 4 to 5 digits after =. I tried this:

import re
pattern = r"(one|two)+=+(\d{4,5})+\D"
found = re.findall(pattern, sometext, flags=re.IGNORECASE)
print(found)

It gives me results like: [('one', '1290')]. If i use pattern = r"((one|two)+=+(\d{4,5})+\D)" it gives me [('one=1290', 'one', '1290')]. How can I get just one=1290?

3 Answers 3

4

You were close. You need to use a single capture group (or none for that matter):

((?:one|two)+=+\d{4,5})+

Full code:

import re

string = 'sometext...one=1290...sometext...two=12985...sometext...three=1233...'

pattern = r"((?:one|two)+=+\d{4,5})+"
found = re.findall(pattern, string, flags=re.IGNORECASE)
print(found)
# ['one=1290', 'two=12985']
Sign up to request clarification or add additional context in comments.

1 Comment

Exactly this. To summarize what the findall() method returns, remember the following: When called on a regex with no groups, such as \d\d\d-\d\d\d-\d\d\d\d, the method findall() returns a list of string matches, such as ['123-456-7890', '987-654-3210']. When called on a regex that has groups, such as (\d\d\d)-(\d\d\d)-(\d\d\d\d), the method findall() returns a list of tuples of strings (one string for each group), such as [('123', '456', '7890'), ('987', '654', '3210')]. Here, the outermost capture group is returned to give the answer(s) you require.
1

Make the inner groups non capturing: ((?:one|two)+=+(?:\d{4,5})+\D)

Comments

1

The reason that you are getting results like [('one', '1290')] rather than one=1290 is because you are using capture groups. Use:

r"(?:one|two)=(?:\d{4,5})(?=\D)"
  • I have removed the additional + repeaters, as they were (I think?) unnecessary. You don't want to match things like oneonetwo===1234, right?
  • Using (?:...) rather than (...) defines a non-capture group. This prevents the result of the capture from being returned, and you instead get the whole match.
  • Similarly, using (?=\D) defines a look-ahead - so this is excluded from the match result.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.