How to use TCP-based HTTP to download image in python?

Question

How to use TCP-based HTTP to download image in python? I do download the image but it says cannot open this file( which probably means not all of the bytes were recv or written). My task is to use socket library and no urlib or requests. Any help is appreciated.

serverPort = 80
clientSocket = socket(AF_INET, SOCK_STREAM)
clientSocket.connect(('google.com', serverPort))
print("ready to receive!")

output = 'GET  http://google.com/favicon.ico HTTP/1.0\r\nHOST: google.com\r\n\r\n'
print(output)
output1 = ('b' + output)
clientSocket.sendall(output1.encode())
reply = b''

while True:
    data = clientSocket.recv(1024)
    if not data:
        break
    reply += data

headers = reply.split(b'\r\n\r\n')[0]
image = reply[len(headers) + 4:]

f = open('image_test.ico', 'wb')
f.write(image)
f.close()

clientSocket.close()

You are doing HTTP with socket? RLY? Use a library like requests. — Klaus D.
– Klaus D., Commented Dec 18, 2018 at 7:24
Then add that requirement to your question and explain why you have that requirement since it is very strange. — Klaus D.
– Klaus D., Commented Dec 18, 2018 at 7:31
HTTP uses TCP as its underlying transport protocol by definition. — tripleee
– tripleee, Commented Dec 18, 2018 at 8:18

Hemang Vyas · Accepted Answer · 2018-12-18 08:22:05Z

1

Try this...

import socket
import select

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('google.com', 80))
s.sendall(b'GET /favicon.ico HTTP/1.0\r\n\r\n')

reply = b''

while select.select([s], [], [], 3)[0]:
    data = s.recv(2048)
    if not data: break
    reply += data

headers =  reply.split(b'\r\n\r\n')[0]
image = reply[len(headers)+4:]

# save image
f = open('google.ico', 'wb')
f.write(image)
f.close()

edited Dec 18, 2018 at 8:22

answered Dec 18, 2018 at 7:31

Hemang Vyas

3192 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Raeed Asif Over a year ago

I found your similar solution in some other thread as well, this code of yours works perfectly fine. But when i use this s.sendall(b'GET google.com/favicon.ico HTTP/1.0\r\nHOST: google.com\r\n\r\n') it doesn't download the image completely ! can you help me with this

Hemang Vyas Over a year ago

Try out the updated code and open the .ico file in your browser.

Raeed Asif Over a year ago

can you explain me what does this do 'select.select([s], [], [], 3)[0]:'?

Raeed Asif Over a year ago

what if i try to download it on http1.1? , it get stuck!

Hemang Vyas Over a year ago

In HTTP1.1 the server will not close the connection and will be waiting for more requests but in HTTP1.0 the connection will be closed after the request is done and transfer of data is done.

tripleee · Accepted Answer · 2018-12-18 08:21:49Z

0

You are not creating a byte object by adding 'b' to the beginning of a string. You are mixing Python's representation with the actual contents.

b'bytes'

is a sequence of bytes where each element is guaranteed to be a single 8-bit byte corresponding to the ASCII code of the character.

'b' + 'bytes'

is a Unicode string where each element is not guaranteed to be a single byte, but rather, a Python character. It is equivalent to

'bbytes'

or (to be really explicit)

u'bbytes'

The b or u prefix is a signal to the Python interpreter for how the sequence should be stored, not part of the value.

To convert a string to a bytes object, call the string's encode method.

output1 = b'bytes'.encode('us-ascii')

answered Dec 18, 2018 at 8:21

tripleee

192k37 gold badges318 silver badges367 bronze badges

Collectives™ on Stack Overflow

How to use TCP-based HTTP to download image in python?

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related