30

I want to extract an IP address from a string (actually a one-line HTML) using Python.

>>> s = "<html><head><title>Current IP Check</title></head><body>Current IP Address: 165.91.15.131</body></html>"

-- '165.91.15.131' is what I want!

I tried using regular expressions, but so far I can only get to the first number.

>>> import re
>>> ip = re.findall( r'([0-9]+)(?:\.[0-9]+){3}', s )
>>> ip
['165']

But I don't have a firm grasp on reg-expression; the above code was found and modified from elsewhere on the web.

1

6 Answers 6

76

Remove your capturing group:

ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', s )

Result:

['165.91.15.131']

Notes:

  • If you are parsing HTML it might be a good idea to look at BeautifulSoup.
  • Your regular expression matches some invalid IP addresses such as 0.00.999.9999. This isn't necessarily a problem, but you should be aware of it and possibly handle this situation. You could change the + to {1,3} for a partial fix without making the regular expression overly complex.
Sign up to request clarification or add additional context in comments.

3 Comments

You can use the regex expression to accept only valid IP addresses "\\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b"
Very nice. For those of you who were interested in the impact of (?:...), you can find it at the docs docs.python.org/2/library/re.html : "(?:...) A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern."
@o0rebelious0o Impressive regexp. For easines and in case that you're already using this python solution, you could use ipaddress.ip_address(ip) to check for it.
6

You can use the following regex to capture only valid IP addresses

re.findall(r'\b25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\b',s)

returns

['165', '91', '15', '131']

1 Comment

Technically, this doesn't match valid IP adresses but valid octets. There can be any number of them, which might need to be checked in a separate step.
5
import re

ipPattern = re.compile('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')

findIP = re.findall(ipPattern,s)

findIP contains ['165.91.15.131']

Comments

4

You can use following regex to extract valid IP without following errors
1.Some detected 123.456.789.111 as valid IP
2.Some don't detect 127.0.00.1 as valid IP
3.Some don't detect IP that start with zero like 08.8.8.8

So here I post a regex that works on all above conditions.

Note : I have extracted more than 2 millions IP without any problem with following regex.

(?:(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d)\.){3}(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d)

2 Comments

Can you please elaborate your regex pattern ??
@MohammadZainAbbas. I think it would be a so long reply. Feel free to enjoy this interactive explanation -> regexr.com/4r3j3
2

easiest way to find the ip address from the log..

 s = "<html><head><title>Current IP Check</title></head><body>Current IP Address: 165.91.15.131</body></html>"
 info = re.findall(r'[\d.-]+', s)

In [42]: info

Out[42]: ['165.91.15.131']

2 Comments

Could you please explain this [\d.-]+ , '- after . what it will represent
[\d.-]+ will grab any number even if the string doesn't contains IPs, i.e: it will grab the IP and the numbers 1 & 2 in a sentence like 104.108.71.62: has versionsr: 1 vs. 2
1

This is how I've done it. I think it's so clean

import re
import urllib2

def getIP():
    ip_checker_url = "http://checkip.dyndns.org/"
    address_regexp = re.compile ('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
    response = urllib2.urlopen(ip_checker_url).read()
    result = address_regexp.search(response)

    if result:
            return result.group()
    else:
            return None

get_IP() returns ip into a string or None

You can substitute address_regexp for other regular expressions if you prefer a more accurate parsing or maybe change the web service provider.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.