regex in python- find a set of digits without interruption

Question

I wrote the following code:

import re

strings = []

strings.append('merchant ID 1234, device ID 45678, serial# 123456789')
strings.append('merchant ID 8765, user ID 531476, serial# 87654321')
strings.append('merchant ID 1234, device ID 4567, serial# 123456789')
strings.append('merchant ID 1234#56, device ID 45678, serial# 123456789')
strings.append('device ID 4567, merchant ID 8765, user ID 531476, serial# 87654321')




for n in strings:
    expr = re.findall(r'merchant\sID\s\d+|device\sID\s\d+', n);
    if len(expr) == 2:
        print(n)

The mission is to scan the 5 strings and print only the strings which got 'merchant ID' and 'device ID' and that the ID number is legid (only degits). So from those 5 strings it should print only the first, the third and the fifth strings. The code i wrote prints also the forth string.

How do i fix the code to recognize that the set of digits 1234#56 is not legit?

Your regexp also find "device ID 4567, device ID 4567" I think it is not what you need — Oleg
– Oleg, Commented Dec 21, 2015 at 16:36

starrify · Accepted Answer · 2015-12-21 16:30:09Z

1

Here's an example for your specific case: you can replace merchant\sID\s\d+ in your regex with merchant\sID\s\d+(?=[\s,$])

Explained: The newly added part of (?=[\s,$]) specifies a lookahead assertion of "followed by a whitespace, a comma, or the end of string". See also: https://docs.python.org/2/library/re.html (search for "lookahead assertion")

If you want a generic solution, I'm afraid you'll need to provide more details first, e.g. how do you define "without interruption".

answered Dec 21, 2015 at 16:30

starrify

15k6 gold badges40 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Tim Pietzcker Over a year ago

@Phoenix: But this also matches lines in which there is only one of the two IDs (line 2 for example), and you said you wanted both to be present?

starrify Over a year ago

@TimPietzcker But this also matches lines in which there is only one of the two IDs (line 2 for example) Negative. Plz read again.

Tim Pietzcker Over a year ago

OK, the regex does match those lines, but the following check for the number of matches filters them out again.

starrify Over a year ago

Thank you @TimPietzcker for pointing out. It's the OP's code that does what you say, and shall we @ him/her if you have some suggestions (on designing or coding style, etc)?

Tim Pietzcker · Accepted Answer · 2015-12-21 16:37:06Z

You can use lookaround assertions to specify which characters may or may not precede/follow a number.

You can also make use of the lookaround to ensure that both IDs will be matched in any order:

In [9]: for n in strings:
   ...:     print(re.findall(r'(?=.*merchant\sID\s(\d+)\b(?!#)).*device\sID\s(\d+)\b(?!#)
   ...:
[('1234', '45678')]
[]
[('1234', '4567')]
[]
[('8765', '4567')]

Test it live on regex101.com.

Explanation:

(?=             # Assert that the following can be matched:
 .*             # Any number of characters
 merchant\sID\s # followed by "merchant ID "
 (\d+)          # and a number (put that in group 1)
 \b(?![#])      # but only if that number isn't followed by #
)               # End of lookahead
.*              # Then match the actual string, any number of characters,
device\sID\s    # followed by "device ID "
(\d+)           # and a number (put that in group 2)
\b(?![#])       # but only if that number isn't followed by #

timgeb · Accepted Answer · 2015-12-21 16:38:18Z

1

You can simply use re.match here to find the strings which start with a specific pattern:

>>> for s in strings:
...     if re.match('[^s]+ ID \d+, [^s]+ ID \d+,', s):
...         print(s)
... 
merchant ID 1234, device ID 45678, serial# 123456789
merchant ID 1234, device ID 4567, serial# 123456789
device ID 4567, merchant ID 8765, user ID 531476, serial# 87654321

Demo with explanation of the pattern: https://regex101.com/r/qA9pY7/1
I added the ^ here to simulate the behavior of re.match.

answered Dec 21, 2015 at 16:38

timgeb

79.2k20 gold badges129 silver badges150 bronze badges

Collectives™ on Stack Overflow

regex in python- find a set of digits without interruption

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related