Proxy manager to return proxy format
Good points
The task of your code is very simple, with main good points:
Documentation
Your documentation of load_proxies does not describe what the function does. You can also use doctests:
@staticmethod
def load_proxies(proxy_file_path) -> List[Dict[str, str]]:
"""
Parse proxies into the format `http://ip:port` or `http://ip:port:user@password`.
>>> load_proxies(['hello:12345', 'hello:12345:apple:orange'])
[{'https': 'http://hello:12345'}, {'https': 'http://hello:12345:apple@orange'}]
>>> load_proxies(['test:4125', 'myserver:1337:usertest:passwordtest'])
[{'https': 'http://test:4125'}, {'https': 'http://myserver:1337:usertest@passwordtest'}]
"""
In the class constructor, note that proxies will be an empty dictionary if proxy_file_path is not passed else it will be a list of parsed proxies. Therefore, your declaration of full_list is wrong if proxy_file_path is not passed in the constructor. Since dictionaries can't be used inside a dictionary, just make the default value a list:
class ProxyManager(object):
def __init__(self, proxy_file_path=None):
self.proxies = self.load_proxies(proxy_file_path) if proxy_file_path else []
If you want to use sets, take a look at frozen sets and change the return type of full_list accordingly (and also its name!).
Performance
When you do proxy_file.readlines(), you're loading all the file into the memory as a list. If you have a small file, it'll work well, but it may raise a MemoryError for large files. So you can replace
for proxy in proxy_file.readlines():
with
for proxy in proxy_file:
Logic
Since 1) it's not an action (such as start or create_proxy) and 2) it doesn't receive arguments, you can use properties instead of methods for full_list and random_proxy:
@property
def random_proxy(self) -> Dict[str, str]:
return random.choice(self.proxies)
@property
def full_list(self) -> List[Dict[str, str]]:
return self.proxies
This allows you to call them
p = ProxyManager("test.txt")
print(p.full_list)
print(p.random_proxy)
Beware the name of full_list: what if you need to change its implementation to another data structure (e.g. a set or a generator)? Existing code that called full_list expecting it to return a list will break. If you want, you can simply name it proxies.
Since proxy_string will always start with "http://", you can already initialize the string with it, replacing
proxy_string = f'{split_string[0]}:{split_string[1]}'
# ...
proxies.append({'https': f'http://{proxy_string}'})
with
proxy_string = f'http://{split_string[0]}:{split_string[1]}'
# ...
proxies.append({'https': proxy_string})
This allows you to evaluate one f-string instead of two.
Since you will always use the first two elements of split_string, you can use tuple unpacking to make the code more readable, replacing
split_string = proxy.strip('\n').split(':')
proxy_string = f'http://{split_string[0]}:{split_string[1]}'
with
ip, port, *args = proxy.strip('\n').split(':')
proxy_string = f'http://{ip}:{port}'
With this, you introduce a new args variable: a list containing all the remaining strings from the split. So, you can check its length and add them to the proxy_string, replacing
if len(split_string) == 4:
proxy_string = f'{proxy_string}:{split_string[2]}@{split_string[3]}'
with
if len(args) == 2:
user, password = args
proxy_string += f":{user}@{password}"
Note that you're doing two tasks inside one function: creating proxies strings that can be formatted as 1) ip:port and 2) ip:post:user:password. It's up to you, but you could extract the logic into two (private) separate functions:
@staticmethod
def load_proxies(proxy_file_path) -> List[Dict[str, str]]:
def _load_ip_proxy(value: str) -> str:
"""
Transform a `{ip}:{port}` string into a `http://{ip}:{port}` string.
"""
ip, port = value.split(":")
return f"http://{ip}:{port}"
def _load_ipup_proxy(value: str) -> str:
"""
Transform a `{ip}:{port}:{user}:{password}` string into a `http://{ip}:{port}:{user}@{password}` string.
"""
ip, port, user, password = value.split(":")
return f"http://{ip}:{port}:{user}@{password}"
with open(proxy_file_path) as proxy_file:
proxies = []
for proxy in proxy_file:
if proxy.count(":") == 1:
proxies.append({'https': _load_ip_proxy(proxy.strip('\n'))})
elif proxy.count(":") == 3:
proxies.append({'https': _load_ipup_proxy(proxy.strip('\n'))})
return proxies
A variation of the above code using Python's "switch statements":
@staticmethod
def load_proxies(proxy_file_path) -> List[Dict[str, str]]:
# ...
# Make a dictionary holding the parsing functions according to their `:` counts
parsing_functions = {1: _load_ip_proxy, 3: _load_ipup_proxy}
with open(proxy_file_path) as proxy_file:
proxies = []
# Use `line` instead of `proxy`
for line in proxy_file:
# Define `proxy` here, defining the strip logic in only one place
proxy = line.strip('\n')
# Get the parsing function by the number of `:` in the proxy
parsing_function = parsing_functions.get(proxy.count(":"))
if parsing_function is None:
# Do nothing/raise an exception if proxy does not contain
# neither one nor three `:` characters
continue
proxies.append({'https': parsing_function(proxy)})
return proxies
Refactored code
import random
from typing import List, Dict, Optional
# `class ClassName(object)` is not needed in Python 3.x
# https://stackoverflow.com/a/1203997/9997212
class ProxyManager:
# What is the type of `proxy_file_path`? You can use Optional
# https://docs.python.org/3/library/typing.html#typing.Optional
def __init__(self, proxy_file_path: Optional[str] = None):
# Make `self.proxies` virtually private by adding a `_` prefix
# https://stackoverflow.com/a/1641236/9997212
self._proxies = self.load_proxies(proxy_file_path) if proxy_file_path else []
# Is this method intended to be available to the outside world?
# If not, make it "private" by adding a `_` prefix.
@staticmethod
# What's the type of `proxy_file_path`?
def load_proxies(proxy_file_path: str) -> List[Dict[str, str]]:
"""
Parse proxies into the format `http://ip:port` or `http://ip:port:user@password`.
>>> load_proxies(['hello:12345', 'hello:12345:apple:orange'])
[{'https': 'http://hello:12345'}, {'https': 'http://hello:12345:apple@orange'}]
>>> load_proxies(['test:4125', 'myserver:1337:usertest:passwordtest'])
[{'https': 'http://test:4125'}, {'https': 'http://myserver:1337:usertest@passwordtest'}]
"""
def _load_ip_proxy(value: str) -> str:
"""
Transform a `{ip}:{port}` string into a `http://{ip}:{port}` string.
"""
ip, port = value.split(":")
return f"http://{ip}:{port}"
def _load_ipup_proxy(value: str) -> str:
"""
Transform a `{ip}:{port}:{user}:{password}` string into a `http://{ip}:{port}:{user}@{password}` string.
"""
ip, port, user, password = value.split(":")
return f"http://{ip}:{port}:{user}@{password}"
parsing_functions = {1: _load_ip_proxy, 3: _load_ipup_proxy}
with open(proxy_file_path) as proxy_file:
proxies = []
for line in proxy_file:
proxy = line.strip('\n')
parsing_function = parsing_functions.get(proxy.count(":"))
if parsing_function is None:
continue
proxies.append({'https': parsing_function(proxy)})
return proxies
@property
def random_proxy(self) -> Dict[str, str]:
return random.choice(self._proxies)
@property
def full_list(self) -> List[Dict[str, str]]:
return self._proxies
# If you import this file, this code will ALWAYS be executed, which
# may not be intended. Add a module condition:
# https://stackoverflow.com/q/419163/9997212
if __name__ == '__main__':
p = ProxyManager("test.txt")
print(p.full_list)
print(p.random_proxy)
```