4

There are several answers on stack overflow about retrieving a FTP file and writing it to a stream such as a string buffer or a file which can then be iterated on.

Such as: Read a file in buffer from FTP python

However, these solutions involve loading the entire file into memory or downloading it to the disk before beginning to process the contents.

I do not have enough memory to buffer the whole file and I do not have access to the disk. This can be done by processing the data in the callback function, but I want to know if it's possible to wrap the ftp code in some magic that returns an iterator rather than peppering my code with callbacks.

I.E. rather than:

def get_ftp_data(handle_chunk):
    ...
    ftp.login('uesr', 'password') # authentication required
    ftp.retrbinary('RETR etc', handle_chunk)
    ...

get_ftp_data(do_stuff_to_chunk)

I want:

for chunk in get_ftp_data():
    do_stuff_to_chunk(chunk)

And (unlike existing answers) I want to do it without writing the entire ftp file to disk or memory before iterating on it.

1

1 Answer 1

7

You'll have to put the retrbinary call in another thread and have the callback feed blocks to an iterator:

import threading, Queue

def ftp_chunk_iterator(FTP, command):
    # Set maxsize to limit the number of chunks kept in memory at once.
    queue = Queue.Queue(maxsize=some_appropriate_size)

    def ftp_thread_target():
        FTP.retrbinary(command, callback=queue.put)
        queue.put(None)

    ftp_thread = threading.Thread(target=ftp_thread_target)
    ftp_thread.start()

    while True:
        chunk = queue.get()
        if chunk is not None:
            yield chunk
        else:
            return

If you can't use threads, the best you can do is writing your callback as a coroutine:

from contextlib import closing


def process_chunks():
    while True:
        try:
            chunk = yield
        except GeneratorExit:
            finish_up()
            return
        else:
            do_whatever_with(chunk)

with closing(process_chunks()) as coroutine:

    # Get the coroutine to the first yield
    coroutine.next()

    FTP.retrbinary(command, callback=coroutine.send)
# coroutine.close() #  called by exiting the block
Sign up to request clarification or add additional context in comments.

11 Comments

I was afraid of that. Intuitively though, it doesn't seem like something that should absolutely require threads. Also, while I didn't explicitly state this in the original questions, my execution environment doesn't have threads. I hope there's a better way.
@natb1: Unfortunately, it does require threads. If you can't use threads, the best you can do is write your callback as a coroutine, and that's less flexible and a lot more mess.
thanks for introducing me to coroutines. unfortunately that example looks to me like a longer winded way of saying FTP.retrbinary(command, callback=do_whatever_with)
@natb1: It is if do_whatever_with is a simple function, but you can put an arbitrary block of code there with dependence on the state of the coroutine. In cases where it does reduce to FTP.retrbinary(command, callback=do_whatever_with), the iterator would have been unnecessary bloat too.
@user2357112 I like the threaded version. The coroutine one looks at first glance as simple calback solution, but there is significant difference - whithin the process_chunks generator all the processing (for all chunks) is written within one piece of code which does not return until close(). Really nice. Proposal: what about putting coroutine creation and closing into with block?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.