1

How to use TCP-based HTTP to download image in python? I do download the image but it says cannot open this file( which probably means not all of the bytes were recv or written). My task is to use socket library and no urlib or requests. Any help is appreciated.

serverPort = 80
clientSocket = socket(AF_INET, SOCK_STREAM)
clientSocket.connect(('google.com', serverPort))
print("ready to receive!")

output = 'GET  http://google.com/favicon.ico HTTP/1.0\r\nHOST: google.com\r\n\r\n'
print(output)
output1 = ('b' + output)
clientSocket.sendall(output1.encode())
reply = b''

while True:
    data = clientSocket.recv(1024)
    if not data:
        break
    reply += data

headers = reply.split(b'\r\n\r\n')[0]
image = reply[len(headers) + 4:]

f = open('image_test.ico', 'wb')
f.write(image)
f.close()

clientSocket.close()
4
  • You are doing HTTP with socket? RLY? Use a library like requests. Commented Dec 18, 2018 at 7:24
  • Yeah, its an requirement to use socket. Commented Dec 18, 2018 at 7:28
  • Then add that requirement to your question and explain why you have that requirement since it is very strange. Commented Dec 18, 2018 at 7:31
  • HTTP uses TCP as its underlying transport protocol by definition. Commented Dec 18, 2018 at 8:18

2 Answers 2

1

Try this...

import socket
import select

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('google.com', 80))
s.sendall(b'GET /favicon.ico HTTP/1.0\r\n\r\n')

reply = b''

while select.select([s], [], [], 3)[0]:
    data = s.recv(2048)
    if not data: break
    reply += data

headers =  reply.split(b'\r\n\r\n')[0]
image = reply[len(headers)+4:]

# save image
f = open('google.ico', 'wb')
f.write(image)
f.close()
Sign up to request clarification or add additional context in comments.

5 Comments

I found your similar solution in some other thread as well, this code of yours works perfectly fine. But when i use this s.sendall(b'GET google.com/favicon.ico HTTP/1.0\r\nHOST: google.com\r\n\r\n') it doesn't download the image completely ! can you help me with this
Try out the updated code and open the .ico file in your browser.
can you explain me what does this do 'select.select([s], [], [], 3)[0]:'?
what if i try to download it on http1.1? , it get stuck!
In HTTP1.1 the server will not close the connection and will be waiting for more requests but in HTTP1.0 the connection will be closed after the request is done and transfer of data is done.
0

You are not creating a byte object by adding 'b' to the beginning of a string. You are mixing Python's representation with the actual contents.

b'bytes'

is a sequence of bytes where each element is guaranteed to be a single 8-bit byte corresponding to the ASCII code of the character.

'b' + 'bytes'

is a Unicode string where each element is not guaranteed to be a single byte, but rather, a Python character. It is equivalent to

'bbytes'

or (to be really explicit)

u'bbytes'

The b or u prefix is a signal to the Python interpreter for how the sequence should be stored, not part of the value.

To convert a string to a bytes object, call the string's encode method.

output1 = b'bytes'.encode('us-ascii')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.