0

I have the following string:

text = '10.0.0.1.1 but 127.0.0.256 1.1.1.1'

and I want to return the valid IP addresses, so it should only return 1.1.1.1 here since 256 is higher than 255 and the first IP has too many numbers.

so far I have the following but it doesn't work on the 0-255 requirement.

text = "10.0.0.1.1 but 127.0.0.256 1.1.1.1"
l = []
import re
for word in text.split(" "):
    if word.count(".") == 3:
        l = re.findall(r"[\d{1,3}]+\.[\d{1,3}]+\.[\d{1,3}]+\.[\d{1,3}]+",word)
4

1 Answer 1

2

Here is a python regex that does a pretty good job of fetching valid IPv4 IP addresses from a string:

import re
reValidIPv4 = re.compile(r"""
    # Match a valid IPv4 in the wild.
    (?:                                         # Group two start-of-IP assertions.
      ^                                         # Either the start of a line,
    | (?<=\s)                                   # or preceeded by whitespace.
    )                                           # Group two start-of-IP assertions.
    (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)    # First number in range 0-255 
    (?:                                         # Exactly 3 additional numbers.
      \.                                        # Numbers separated by dot.
      (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)  # Number in range 0-255 .
    ){3}                                        # Exactly 3 additional numbers.
    (?=$|\s)                                    # End IP on whitespace or EOL.
    """, re.VERBOSE | re.MULTILINE)

text = "10.0.0.1.1 but 127.0.0.256 1.1.1.1"
l = reValidIPv4.findall(text)
print(l)
Sign up to request clarification or add additional context in comments.

7 Comments

I'm confused by your comments, does it start on group two or one? I see that comment listed twice and I'm trying to understand more.
@wannabe_n00b - I can see why you were confused - poor wording on my part. There are actually no capture groups in this regex. The first (non-capturing) group is: "grouping two alternatives, each of which is an assertion" I always repeat the comment at the close of each group to associate the start and the end of the group comment-wise.
what would the effect be if I changed your code to [01]?[0-9]?[0-9]? It seems like it would be better?
@wannabe_n00b - The expression: [01]?[0-9]?[0-9]? matches an empty string (i.e. this matches every position in every string that has ever existed). This won't work because there needs to be at least one digit in each of the 4 IPv4 dotted quad positions.
If you had an IP that was "0.0.0.1", How would this IP be evaluated against [01]?[0-9][0-9]? Wouldn't the [01]? pick up the numbers but then fail on the manditory [0-9]
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.