2

The problem

I need to check if domain from URL is not pointing to a private IP before request and also return IP that was used for HTTP connection.

This is my test script:

import ipaddress
import requests
import socket
import sys

from urllib.parse import urlparse


def get_ip(url):
    hostname = socket.gethostbyname(urlparse(url).hostname)
    print('IP: {}'.format(hostname))
    if hostname:
        return ipaddress.IPv4Address(hostname).is_private

def get_req(url):
    private_ip = get_ip(url)
    if not private_ip:
        try:
            with requests.Session() as s:
                s.max_redirects = 5
                r = s.get(url, timeout=5, stream=True)
            return {'url': url, 'staus_code': r.status_code}
        except requests.exceptions.RequestException:
            return 'ERROR'
    return 'Private IP'

if __name__ == '__main__':
    print(get_req(sys.argv[1]))

This won't work if domain is resolving to multiply IPs, for instance if website is hosted behind CloudFlare:

# python test.py http://example.com
IP: 104.31.65.106
{'staus_code': 200, 'url': 'http://exmaple.com'}

A snippet from tcpdump:

22:21:51.833221 IP 1.2.3.4.54786 > 104.31.64.106.80: Flags [S], seq 902413592, win 29200, options [mss 1460,sackOK,TS val 252001723 ecr 0,nop,wscale 7], length 0
22:21:51.835313 IP 104.31.64.106.80 > 1.2.3.4.54786: Flags [S.], seq 2314392251, ack 902413593, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 10], length 0
22:21:51.835373 IP 1.2.3.4.54786 > 104.31.64.106.80: Flags [.], ack 1, win 229, length 0

The script tested it on 104.31.65.106 but HTTP connection was made on 104.31.64.106

I saw this thread but I won't be consuming the response body so the connection won't be released and actually my version of requests module doesn't have these attributes.

Is there a way to achive this with requests module or do I have to use another library like urllib or urliib3?

To clarify: I only need to prevent the request if an attempt would be made to connect to a private network address. If there are multiple options and a public address is picked, it's fine.

13
  • Why exactly doesn't rsp=requests.get(..., stream=True);rsp.raw._connection.sock.getpeername() work for you? Commented Jun 15, 2017 at 22:14
  • OK, so I just tested it and I guess I could close connection in try/except block but it looks like stream works only if server has keep-alive enabled, otherwise connection is closed immediately and I get AttributeError: 'NoneType' object has no attribute 'getpeername'. I would like to also check IP before request is made. Commented Jun 16, 2017 at 7:26
  • Why all the shenanigans with with requests.Session() as s then s = requests.Session()? That just replaced your configured session, drop the s = ... line. Commented Jun 16, 2017 at 9:14
  • @MartijnPieters I believe that's what left from testing, I just removed it. Commented Jun 16, 2017 at 11:15
  • So if a hostname can resolve to either a public address or a private one, you want to block that request? Or only if the currently picked IP address is not a public one? Commented Jun 16, 2017 at 11:17

1 Answer 1

2
+100

urllib3 will automatically skip unroutable addresses for a given DNS name. This is not something that needs preventing.

What happens internally when creating a connection is this:

  • DNS information is requested; if your system supports IPv6 (binding to ::1 succeeds) then that includes IPv6 addresses.
  • In the order that the addresses are listed, they are tried one by one
    • for each address a suitable socket is configured and
    • The socket is told to connect to the IP address
    • If connecting fails, the next IP address is tried, otherwise the connected socket is returned.

See the urllib3.util.connection.create_connection() function. Private networks are usually not routable and are thus skipped automatically.

However, if you are on a private network yourself, then it is possible that an attempt is made to connect to that IP address anyway, which can take some time to resolve.

The solution is to adapt a previous answer of mine that lets you resolve the hostname at the point where the socket connection is created; this should let you skip private use addresses. Create your own loop over socket.getaddrinfo() and raise an exception at that point if a private network address would be attempted:

import socket
from ipaddress import ip_address
from urllib3.util import connection


class PrivateNetworkException(Exception):
    pass


_orig_create_connection = connection.create_connection

def patched_create_connection(address, *args, **kwargs):
    """Wrap urllib3's create_connection to resolve the name elsewhere"""
    # resolve hostname to an ip address; use your own
    # resolver here, as otherwise the system resolver will be used.
    family = connection.allowed_gai_family()

    host, port = address
    err = None
    for *_, sa in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
        ip, port = sa
        if ip_address(ip).is_private:
            # Private network address, raise an exception to prevent
            # connecting
            raise PrivateNetworkException(ip)
        try:
            # try to create connection for this one address
            return _orig_create_connection((ip, port), *args, **kwargs)
        except socket.error as err:
            last_err = err
            continue

        if last_err is not None:
            raise last_err

connection.create_connection = patched_create_connection

So this code looks up the IP addresses for a host early, then raises a custom exception. Catch that exception:

with requests.Session(max_redirects=5) as s:
    try:
        r = s.get(url, timeout=5, stream=True)
        return {'url': url, 'staus_code': r.status_code}
    except PrivateNetworkException:
        return 'Private IP'
    except requests.exceptions.RequestException:
        return 'ERROR'
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, any suggestion where I could pass IP that the connection was actually made on to requests.raw._original_response?
@HTF: I'm going to assume you are using Python 3 and therefor other answers you found on SO that apply to Python 2 no longer work. That's because the socket file is a little more complex now. requests.raw._original_response is a http.client.HTTPResponse instance, .fp is the socketfile, which here consists of a buffer wrapping a SocketIO object with the actual socket in the _sock attribute. So the original socket is available as requests.raw._original_response.fp.raw._sock. Call .getpeername() on that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.