3

I'm trying to write a program that transfers binary files from the client to the server. Here's the code:

Client (send file)

  def send_file(self,filename):
        print("Sending: " + filename)
        size = self.BUFFER_SIZE
        with open(filename,'rb') as f:
            raw = f.read().decode()
        buffer = [raw[i:i + size] for i in range(0, len(raw), size)]
        for x in range(len(buffer)):
            self.sock.sendall(buffer[x].encode())

        return

Server (recv file)

def recv_file(self, conn, filename):
    packet = ""
    buffer = ""
    while True:
        buffer = conn.recv(self.BUFFER_SIZE)
        packet = packet + str(buffer.decode())
        if not len(buffer) == self.BUFFER_SIZE:
            break
    with open(filename, 'wb') as f:
        f.write(bytes(packet.encode()))
    #print(packet)
    return 

This way I can transfer txt files, but when I have to transfer jpeg or any other type of file, it freezes in the loop. Can someone please explain me why? I'm new to py and i'm trying to learn

2 Answers 2

5

It shouldn't freeze if both sides have the same locale encoding, but it could easily die with an exception.

You're reading and sending as binary (good), but inexplicably decode-ing to str, then encodeing back to bytes (bad). Problem is, arbitrary binary data isn't guaranteed to be decodable in any given locale; if your locale encoding is UTF-8, odds are it's not legal. If it's latin-1 it's legal, but pointless.

Worse, if your client and server have different locale encodings, the result of decoding might be different on each side (and therefore the lengths won't match).

Use bytes consistently, don't convert to and from strings, and locale settings won't matter. Your code will also run faster. You also need to actually send the file length ahead of time; your loop is hoping recv will return a short length only when the file is done, but if:

  1. The file is an exact multiple of the buffer size, or
  2. The socket happens to send data in chunks that don't match the buffer size

you can each get short recv results, by coincidence in case #2, and deterministically in case #1.

A safer approach is to actually prefix your transmission with the file length, rather than hoping the chunking works as expected:

def send_file(self,filename):
    print("Sending:", filename)
    with open(filename, 'rb') as f:
        raw = f.read()
    # Send actual length ahead of data, with fixed byteorder and size
    self.sock.sendall(len(raw).to_bytes(8, 'big'))
    # You have the whole thing in memory anyway; don't bother chunking
    self.sock.sendall(raw)

def recv_file(self, conn, filename):
    # Get the expected length (eight bytes long, always)
    expected_size = b""
    while len(expected_size) < 8:
        more_size = conn.recv(8 - len(expected_size))
        if not more_size:
            raise Exception("Short file length received")
        expected_size += more_size

    # Convert to int, the expected file length
    expected_size = int.from_bytes(expected_size, 'big')

    # Until we've received the expected amount of data, keep receiving
    packet = b""  # Use bytes, not str, to accumulate
    while len(packet) < expected_size:
        buffer = conn.recv(expected_size - len(packet))
        if not buffer:
            raise Exception("Incomplete file received")
        packet += buffer
    with open(filename, 'wb') as f:
        f.write(packet)
Sign up to request clarification or add additional context in comments.

3 Comments

Note: If you're on 3.5 or higher, it would usually make sense to replace the explicit read-all then sendall for the file data with a simple os.stat (to get file size), sendall for the file size only, then use sendfile on the file handle itself. Where the sendfile system call is available, this is much faster (it pushes all the work to the kernel, getting zero copy transmission of data). Even where it's not available, it will chunk the read/send loop so it doesn't hold the whole file in memory at once.
Thanks, really appreciate that. So dividing the file in chunks as I did is pointless?
@MarioValentino: Dividing into chunks makes sense if the goal is to avoid storing the entire file in memory at once. But your original code was loading the whole file into memory eagerly (calling .read() with no arguments), then splitting into chunks (also eagerly), so instead of avoiding storing the whole file in memory, it was storing it twice. It didn't help with the socket operations at all; TCP connections (by default) buffer data to avoid sending small packets, and that buffering effectively re-chunks your data anyway.
1

As an addendum to ShadowRanger's post, If you do want to maintain the file chunking without using socket.sendfile you can utilize a few tricks to clean up your code and reduce memory footprint.

The sending process is fairly simple as we copied the process of sending the file size from ShadowRanger, and added a very simple loop to send chunks of data until the chunk comes up empty (end of the file).

def send_file(self,filename):
    print("Sending: " + filename)
    #send file size as big endian 64 bit value (8 bytes)
    self.sock.sendall(os.stat(filename).st_size.tobytes(8,'big'))
    with open(filename,'rb') as f: #open our file to read
        while True:
            chunk = f.read(self.BUFFER_SIZE) #get next chunk
            if not chunk: #empty chunk indicates EOF
                break
            self.sock.sendall(chunk) #send the chunk

Receiving a file is also very straightforward with the same process to read the expected file size at the beginning, then a loop to read data into that file until we reach our expected size. We then use f.tell() as we receive data as an easy way to tell if the whole file has been sent yet.

def recv_file(self, conn, filename):
    # file size transfer copied from ShadowRanger
    # Get the expected length (eight bytes long, always)
    expected_size = b"" #buffer to read in file size
    while len(expected_size) < 8: #while buffer is smaller than 8 bytes
        more_size = conn.recv(8 - len(expected_size)) #read up to remaining bytes
        if not more_size: #nothing was read
            raise Exception("Short file length received")
        expected_size += more_size #extend buffer
    expected_size = int.from_bytes(expected_size, 'big') #Convert to int, the expected file length
    with open(filename, 'wb') as f: #open our file to write
        while f.tell() < expected_size: #while it's smaller than our expected size
            bytes_recvd = conn.recv() #read any available data 
            f.write(bytes_recvd)

11 Comments

You've got some problems here: 1) You can't recv_into an open file object (you could recv_into a mmap-ed file wrapped in a memoryview that would be used to target the write, but it's probably not worth the trouble). 2) (Minor) There is no need to check the return value from sendall at all (it's documented as returning None on success, and raising an exception on failure, so your while condition is guaranteed to either be true, or unevaluated). Moving the sendall to the end of the loop and changing to while True: would allow you to remove the duplicated read call too.
@ShadowRanger good catch, I saw 'buffer' in the docs and assumed it would work with files.
I would have assumed TCP would take care of any error detection / correction, so that's somewhat surprising.. Perhaps this deserves a new question..
@MarioValentino: My assumption would be that you're intermingling this with other data transmissions, and mistaking part of the end of one transmission for the beginning of the length field of the next. Am I correct that you're sending data beyond just file length and data? Given your original design (where you assumed sending 10 bytes at a time from the client would receive 10 bytes at a time on the server, which is a bad assumption), it's highly likely you're picking up random bytes from earlier sends on the same socket that weren't properly recv when they were intended to be received.
The fact that it works just fine for client to server, but not in reverse, makes it seem fairly likely that there is a bug only in the server to client direction; obviously the basic design should work, but it's easy to make simple mistakes.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.