39

I am trying to read files using Python's ftplib without writing them. Something roughly equivalent to:

def get_page(url):
    try:
        return urllib.urlopen(url).read()
    except:
        return ""

but using FTP.

I tried:

def get_page(path):
    try:
        ftp = FTP('ftp.site.com', 'anonymous', 'passwd')
        return ftp.retrbinary('RETR '+path, open('page').read())
    except:
        return ''

but this doesn't work. The only examples in the docs involve writing files using the ftp.retrbinary('RETR README', open('README', 'wb').write) format. Is it possible to read ftp files without writing first?

0

2 Answers 2

70

Well, you have the answer right in front of you: The FTP.retrbinary method accepts as second parameter a reference to a function that is called whenever file content is retrieved from the FTP connection.

Here is a simple example:

#!/usr/bin/env python
from ftplib import FTP

def writeFunc(s):
  print "Read: " + s

ftp = FTP('ftp.kernel.org') 
ftp.login()
ftp.retrbinary('RETR /pub/README_ABOUT_BZ2_FILES', writeFunc)

You should implement writeFunc so that it actually appends the data read to an internal variable, something like this, which uses a callable object:

#!/usr/bin/env python
from ftplib import FTP

class Reader:
  def __init__(self):
    self.data = ""
  def __call__(self,s):
     self.data += s

ftp = FTP('ftp.kernel.org') 
ftp.login()
r = Reader()
ftp.retrbinary('RETR /pub/README_ABOUT_BZ2_FILES', r)

print r.data

Update: I realized that there is a module in the Python standard library that is meant for this kind of things, BytesIO:

#!/usr/bin/env python
from ftplib import FTP
from io import BytesIO

ftp = FTP('ftp.kernel.org') 
ftp.login()
r = BytesIO()
ftp.retrbinary('RETR /pub/README_ABOUT_BZ2_FILES', r.write)

print r.getvalue()
Sign up to request clarification or add additional context in comments.

5 Comments

Awesome, thanks! I didn't realize the callback could be a user defined function
For Python 3, retrbinary requires BytesIO, because it returns bytes, not string. If you want StringIO, try ftp.retrlines()
@TimRichardson I tried ftp.retelines(f'RETR {filename}') returns BrokenPipeError, trackback shows there're some problem in file encoding (at self.sock.sendall(line.encode(self.encoding)) in putline). The file I'm trying to get is md5 hash (link: ftp.ncbi.nlm.nih.gov/pubmed/baseline/pubmed23n0003.xml.gz.md5)
@jimmymcheung you have a typo, it is not "retelines"
@TimRichardson thanks, but it was the typo here. I checked the script and it was correct (tho now I just use retrbinary and comment out the retrlines script, because I read the documentation more carefully and found retrlines is not read file in "string" mode)
0

With retrlines, if the file retrieved is in text mode, things can be even more simple and you get directly a list, like readlines :

#!/usr/bin/env python
from ftplib import FTP
    
ftp = FTP('ftp.kernel.org') 
ftp.login()
r = []
ftp.retrlines('RETR /pub/README_ABOUT_BZ2_FILES', r.append)

print(r)

for the reverse process (send from BytesIO) see How can I send a StringIO via FTP in python 3?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.