Python programming with FTP and lists

Question

My main goal is to check an FTP server at anytime for a new file hits and then generate a .txt file with only the new files copied there. If there are no new files then it returns nothing. Here is what I have so far. I have started by copying the files from the server into oldlist.txt, then connecting to the FTP site and comparing data from newlist.txt and oldlist.txt and the differences I want in Temporary FTP file changes.txt. Each time I connect I will change newlist.txt and make it the oldlist.txt so that I can compare the next time I connect. Is there a better way to do this? My lists seem to never change data each time. Sorry if this is confusing thanks.

import os
filename = "oldlist.txt"
testing = "newlist.txt"
tempfilename = "Temporary FTP file Changes.txt"

old = open(filename, "r")
oldlist = old.readlines()
oldlist.sort()


from ftplib import FTP
ftp = FTP("ftpsite", "username", "password")
ftp.set_pasv(False)
newlist = []
ftp.dir(newlist.append)
newlist.sort()
ftp.close()

bob = open(testing, "w")
for nl in newlist:
    bob.write(nl + "\n")


hello = open(tempfilename, "w")

for c in newlist:
    if c not in oldlist:
    hello.write(c + "\n")

bob.close()
old.close()   
hello.close()

os.remove("oldlist.txt")

os.rename("newlist.txt", "oldlist.txt")

Raymond Hettinger · Accepted Answer · 2011-10-20 21:30:50Z

3

It's a little easier/faster to convert the lists to a set and not worry about sorting.

for filename in set(newlist) - set(oldlist):
    print 'New file: ', filename

Also, instead of saving the list to a file as raw text, you could use the shelve module to make a persistent store that is conveniently accessible like a regular Python dict.

Otherwise, your code has the virtues of being simple and straight-forward.

Here's a worked out example:

from ftplib import FTP
import shelve

olddir = shelve.open('filelist.shl')   # create a persistent dictionary

ftp = FTP('ftp1.freebsd.org')
ftp.login()

result = []
ftp.dir(result.append)
newdir = set(result[1:])

print ' New Files '.center(50, '=')
for line in sorted(set(newdir) - set(olddir)):
    print line
    olddir[line] = ''
print ' Done '.center(50, '=')
olddir.close()

edited Oct 20, 2011 at 21:30

answered Oct 20, 2011 at 20:28

Raymond Hettinger

229k67 gold badges405 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user1005974 Over a year ago

so what you are saying is instead of creating an oldlist.txt file just keep it stored somewhere where i am able to compare the new list to the module and then output my changes to a file? Sorry if that is not correct i am very new. And could you show example as well. Thanks so much for your help! Also when i run that code you gave me it just shows me the whole list from the ftp server. I just need the changes, if there are any.

user1005974 Over a year ago

thank you for your example raymond. i will try this at the moment and see what it comes up with. Thank you again.

user1005974 Over a year ago

Raymond your code seems to work great. but i am going to swap the print line code and just copy it to a text file so that i can kick off another process. Thank you for your help.

wberry · Accepted Answer · 2011-10-20 21:39:49Z

0

Your implementation of this scheme is reasonable. I would not choose this scheme to implement automated FTP messaging, if that is what you're doing. There are two weaknesses of this approach:

It does not support filenames that repeat. Any filename that occurs in the "old" history will not be detected as a new file. Maybe this is a problem for you, maybe not. But even if filenames are guaranteed unique now, that may not always be true.
It does not tell you whether a new file is ready to be consumed or not. It is possible that a new file will be processed while it is still being uploaded. Some people apply a "no change in size for X seconds" rule, but that just increases delay and still leaves a vulnerability to severed connections.

One scheme that is similar but does not have either of these two problems is to actually store a file on the server with a reserved name, or in a separate place, and use its timestamp (preferably the modification time of the file itself) to decide which files can be safely processed. This "semaphore" file is updated to the current time as the last step in uploading a file. All files with a modification time older than the semaphore timestamp can be processed. Once processed, all files must be deleted out of the upload folder so they won't be processed twice. I have seen this scheme work well in an automated production data flow.

answered Oct 20, 2011 at 21:39

wberry

19.5k9 gold badges59 silver badges89 bronze badges

2 Comments

user1005974 Over a year ago

wberry, thank you for your response. i have no trouble with the files having the exact same name, everytime the client sends us a file it has a date and time stamp included in the file name. I am not really sure what you mean on your second paragraph. i kind of have an idea but would not be sure how to implement that with my code. Thanks so much!

wberry Over a year ago

My preferred method of perventing partial file consumption is actually to upload files in a temp folder, and then move them into the final folder after they are uploaded. The consumer may then process any files it sees, then delete them. But FTP server permissions must allow moving files for this to work.

Collectives™ on Stack Overflow

Python programming with FTP and lists

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related