44

I have this program that check a website, and I want to know how can I check it via proxy in Python...

this is the code, just for example

while True:
    try:
        h = urllib.urlopen(website)
        break
    except:
        print '['+time.strftime('%Y/%m/%d %H:%M:%S')+'] '+'ERROR. Trying again in a few seconds...'
        time.sleep(5)
1

4 Answers 4

62

Python 3 is slightly different here. It will try to auto detect proxy settings but if you need specific or manual proxy settings, think about this kind of code:

#!/usr/bin/env python3
import urllib.request

proxy_support = urllib.request.ProxyHandler({'http' : 'http://user:pass@server:port', 
                                             'https': 'https://...'})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)

with urllib.request.urlopen(url) as response:
    # ... implement things such as 'html = response.read()'

Refer also to the relevant section in the Python 3 docs

Sign up to request clarification or add additional context in comments.

Comments

56

By default, urlopen uses the environment variable http_proxy to determine which HTTP proxy to use:

$ export http_proxy='http://myproxy.example.com:1234'
$ python myscript.py  # Using http://myproxy.example.com:1234 as a proxy

If you instead want to specify a proxy inside your application, you can give a proxies argument to urlopen:

proxies = {'http': 'http://myproxy.example.com:1234'}
print("Using HTTP proxy %s" % proxies['http'])
urllib.urlopen("http://www.google.com", proxies=proxies)

Edit: If I understand your comments correctly, you want to try several proxies and print each proxy as you try it. How about something like this?

candidate_proxies = ['http://proxy1.example.com:1234',
                     'http://proxy2.example.com:1234',
                     'http://proxy3.example.com:1234']
for proxy in candidate_proxies:
    print("Trying HTTP proxy %s" % proxy)
    try:
        result = urllib.urlopen("http://www.google.com", proxies={'http': proxy})
        print("Got URL using proxy %s" % proxy)
        break
    except:
        print("Trying next proxy in 5 seconds")
        time.sleep(5)

13 Comments

using your example, how can I print what proxy it is using in the time the urlopen occur?
@Shady: Just throw in a print statement that prints the value of proxies['http']. Take a look at my updated example to see how it could be done.
ok thanks, but if I want more proxies, like, tons of it, for example 10 proxies, opening one before the next one
@Shady: You mean that you want to try a new proxy for each call until you find one that works? Change the proxies argument for each call to urlopen, passing in a new proxy for each call.
urllib.urlopen in Python 3 doen't have parameter proxies. It out of dated: > Proxy handling, which was done by passing a dictionary parameter to urllib.urlopen, can be obtained by using ProxyHandler objects.
|
6

Here example code guide how to use urllib to connect via proxy:

authinfo = urllib.request.HTTPBasicAuthHandler()

proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})

# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
                                     urllib.request.CacheFTPHandler)

# install it
urllib.request.install_opener(opener)

f = urllib.request.urlopen('http://www.google.com/')
"""

1 Comment

Can you explain what is authinfo and give an example? Thanks.
2

For http and https use:

proxies = {'http':'http://proxy-source-ip:proxy-port',
           'https':'https://proxy-source-ip:proxy-port'}

more proxies can be added similarly

proxies = {'http':'http://proxy1-source-ip:proxy-port',
           'http':'http://proxy2-source-ip:proxy-port'
           ...
          }

usage

filehandle = urllib.urlopen( external_url , proxies=proxies)

Don't use any proxies (in case of links within network)

filehandle = urllib.urlopen(external_url, proxies={})

Use proxies authentication via username and password

proxies = {'http':'http://username:password@proxy-source-ip:proxy-port',
           'https':'https://username:password@proxy-source-ip:proxy-port'}

Note: avoid using special characters such as :,@ in username and passwords

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.