1

I am trying to parse IP address from a string:

>>> import re
>>> input_str = '''
kjhdkjfh shfkjdsh shfk 1.1.1.1 kaseroi 1.1.1.1 jsoiu 1.1.1.1 
1
1
11
123
132132.23213.213213.123213
23.23.23.23 2321321.33.3.3.3 3.3..3.3.3.3.3. 
3.3.3.3.3.3

3.3.3.3
34.5.6.7
agdi 123213.44.4.5 12.12.12.12
'''
>>> 
>>> 
>>> pattern = r"\b(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\b"
>>> re.findall(pattern, input_str)
['1.1.1.1', '1.1.1.1', '1.1.1.1', '23.23.23.23', '33.3.3.3', '3.3.3.3', '3.3.3.3', '3.3.3.3', '34.5.6.7', '12.12.12.12']
>>>

But the valid IP list is:

['1.1.1.1', '1.1.1.1', '1.1.1.1', '23.23.23.23', '3.3.3.3', '34.5.6.7', '12.12.12.12']

Is there anything wrong with regex?

2 Answers 2

3

You just need to add negative lookahead and lookbehind in your pattern.

(?<!\.)\b(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\b(?!\.\d?)

DEMO

OR

(?<!\S)(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])(?!\S)

DEMO

  • (?<!\S) Negative lookbehind asserts that (what or character) precedes the match would be any but not a non-space character.
  • (?!\S) Negative lookahead asserts that what follows the match would be any but not a non-space character.

Code:

>>> re.findall(r'(?<!\S)(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])(?!\S)', input_str)
['1.1.1.1', '1.1.1.1', '1.1.1.1', '23.23.23.23', '3.3.3.3', '34.5.6.7', '12.12.12.12']
Sign up to request clarification or add additional context in comments.

Comments

1

You cannot use \b to limit the regex because . is included within the \b. From the input string we can notice that the ips are delimited using space hence \s is a much better option.

Changing the regex with a lookarounds for \s would serve the pupose

>>> attern = r"(?<=\s)(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[0-9]?[0-9])(?=\s)"
>>> re.findall(attern, input_str)
['1.1.1.1', '1.1.1.1', '1.1.1.1', '23.23.23.23', '3.3.3.3', '34.5.6.7', '12.12.12.12']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.