I am trying to get the link to an image from an urllib.request response.
I am trying to get content from this page: https://drscdn.500px.org/photo/27428737/m%3D900/v2?webp=true&sig=3d3700c82ea515ecc0b66ca265d6909d67861fbe055c0e817b535f75b21c7ebf and decode it but the decode("utf-8") method gives me the error: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte. I've already checked for the page encoding using document.characterSet in the browser console and it matches the utf-8 encoding.
def ex4():
url = sys.argv[1]
r = re.compile(b"<img .*? src=\"([^\"])*\" (.*?)*>")
try:
resource = urllib.request.urlopen(url)
response = resource.read().decode("utf-8")
print(response)
obj = r.search(response)
if obj:
print(obj.group(1))
else:
print("not found")
except Exception as e:
print("error: ", e)
ex4()