1

I wrote the following code:

import re

strings = []

strings.append('merchant ID 1234, device ID 45678, serial# 123456789')
strings.append('merchant ID 8765, user ID 531476, serial# 87654321')
strings.append('merchant ID 1234, device ID 4567, serial# 123456789')
strings.append('merchant ID 1234#56, device ID 45678, serial# 123456789')
strings.append('device ID 4567, merchant ID 8765, user ID 531476, serial# 87654321')




for n in strings:
    expr = re.findall(r'merchant\sID\s\d+|device\sID\s\d+', n);
    if len(expr) == 2:
        print(n)

The mission is to scan the 5 strings and print only the strings which got 'merchant ID' and 'device ID' and that the ID number is legid (only degits). So from those 5 strings it should print only the first, the third and the fifth strings. The code i wrote prints also the forth string.

How do i fix the code to recognize that the set of digits 1234#56 is not legit?

2
  • This is some assignment i've got and i need to use regex. Commented Dec 21, 2015 at 16:25
  • Your regexp also find "device ID 4567, device ID 4567" I think it is not what you need Commented Dec 21, 2015 at 16:36

3 Answers 3

1

Here's an example for your specific case: you can replace merchant\sID\s\d+ in your regex with merchant\sID\s\d+(?=[\s,$])

Explained: The newly added part of (?=[\s,$]) specifies a lookahead assertion of "followed by a whitespace, a comma, or the end of string". See also: https://docs.python.org/2/library/re.html (search for "lookahead assertion")

If you want a generic solution, I'm afraid you'll need to provide more details first, e.g. how do you define "without interruption".

Sign up to request clarification or add additional context in comments.

4 Comments

@Phoenix: But this also matches lines in which there is only one of the two IDs (line 2 for example), and you said you wanted both to be present?
@TimPietzcker But this also matches lines in which there is only one of the two IDs (line 2 for example) Negative. Plz read again.
OK, the regex does match those lines, but the following check for the number of matches filters them out again.
Thank you @TimPietzcker for pointing out. It's the OP's code that does what you say, and shall we @ him/her if you have some suggestions (on designing or coding style, etc)?
1

You can use lookaround assertions to specify which characters may or may not precede/follow a number.

You can also make use of the lookaround to ensure that both IDs will be matched in any order:

In [9]: for n in strings:
   ...:     print(re.findall(r'(?=.*merchant\sID\s(\d+)\b(?!#)).*device\sID\s(\d+)\b(?!#)
   ...:
[('1234', '45678')]
[]
[('1234', '4567')]
[]
[('8765', '4567')]

Test it live on regex101.com.

Explanation:

(?=             # Assert that the following can be matched:
 .*             # Any number of characters
 merchant\sID\s # followed by "merchant ID "
 (\d+)          # and a number (put that in group 1)
 \b(?![#])      # but only if that number isn't followed by #
)               # End of lookahead
.*              # Then match the actual string, any number of characters,
device\sID\s    # followed by "device ID "
(\d+)           # and a number (put that in group 2)
\b(?![#])       # but only if that number isn't followed by #

Comments

1

You can simply use re.match here to find the strings which start with a specific pattern:

>>> for s in strings:
...     if re.match('[^s]+ ID \d+, [^s]+ ID \d+,', s):
...         print(s)
... 
merchant ID 1234, device ID 45678, serial# 123456789
merchant ID 1234, device ID 4567, serial# 123456789
device ID 4567, merchant ID 8765, user ID 531476, serial# 87654321

Demo with explanation of the pattern: https://regex101.com/r/qA9pY7/1
I added the ^ here to simulate the behavior of re.match.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.