3

My main goal is to check an FTP server at anytime for a new file hits and then generate a .txt file with only the new files copied there. If there are no new files then it returns nothing. Here is what I have so far. I have started by copying the files from the server into oldlist.txt, then connecting to the FTP site and comparing data from newlist.txt and oldlist.txt and the differences I want in Temporary FTP file changes.txt. Each time I connect I will change newlist.txt and make it the oldlist.txt so that I can compare the next time I connect. Is there a better way to do this? My lists seem to never change data each time. Sorry if this is confusing thanks.

import os
filename = "oldlist.txt"
testing = "newlist.txt"
tempfilename = "Temporary FTP file Changes.txt"

old = open(filename, "r")
oldlist = old.readlines()
oldlist.sort()


from ftplib import FTP
ftp = FTP("ftpsite", "username", "password")
ftp.set_pasv(False)
newlist = []
ftp.dir(newlist.append)
newlist.sort()
ftp.close()

bob = open(testing, "w")
for nl in newlist:
    bob.write(nl + "\n")


hello = open(tempfilename, "w")

for c in newlist:
    if c not in oldlist:
    hello.write(c + "\n")

bob.close()
old.close()   
hello.close()

os.remove("oldlist.txt")

os.rename("newlist.txt", "oldlist.txt")

2 Answers 2

3

It's a little easier/faster to convert the lists to a set and not worry about sorting.

for filename in set(newlist) - set(oldlist):
    print 'New file: ', filename

Also, instead of saving the list to a file as raw text, you could use the shelve module to make a persistent store that is conveniently accessible like a regular Python dict.

Otherwise, your code has the virtues of being simple and straight-forward.

Here's a worked out example:

from ftplib import FTP
import shelve

olddir = shelve.open('filelist.shl')   # create a persistent dictionary

ftp = FTP('ftp1.freebsd.org')
ftp.login()

result = []
ftp.dir(result.append)
newdir = set(result[1:])

print ' New Files '.center(50, '=')
for line in sorted(set(newdir) - set(olddir)):
    print line
    olddir[line] = ''
print ' Done '.center(50, '=')
olddir.close()
Sign up to request clarification or add additional context in comments.

3 Comments

so what you are saying is instead of creating an oldlist.txt file just keep it stored somewhere where i am able to compare the new list to the module and then output my changes to a file? Sorry if that is not correct i am very new. And could you show example as well. Thanks so much for your help! Also when i run that code you gave me it just shows me the whole list from the ftp server. I just need the changes, if there are any.
thank you for your example raymond. i will try this at the moment and see what it comes up with. Thank you again.
Raymond your code seems to work great. but i am going to swap the print line code and just copy it to a text file so that i can kick off another process. Thank you for your help.
0

Your implementation of this scheme is reasonable. I would not choose this scheme to implement automated FTP messaging, if that is what you're doing. There are two weaknesses of this approach:

  • It does not support filenames that repeat. Any filename that occurs in the "old" history will not be detected as a new file. Maybe this is a problem for you, maybe not. But even if filenames are guaranteed unique now, that may not always be true.
  • It does not tell you whether a new file is ready to be consumed or not. It is possible that a new file will be processed while it is still being uploaded. Some people apply a "no change in size for X seconds" rule, but that just increases delay and still leaves a vulnerability to severed connections.

One scheme that is similar but does not have either of these two problems is to actually store a file on the server with a reserved name, or in a separate place, and use its timestamp (preferably the modification time of the file itself) to decide which files can be safely processed. This "semaphore" file is updated to the current time as the last step in uploading a file. All files with a modification time older than the semaphore timestamp can be processed. Once processed, all files must be deleted out of the upload folder so they won't be processed twice. I have seen this scheme work well in an automated production data flow.

2 Comments

wberry, thank you for your response. i have no trouble with the files having the exact same name, everytime the client sends us a file it has a date and time stamp included in the file name. I am not really sure what you mean on your second paragraph. i kind of have an idea but would not be sure how to implement that with my code. Thanks so much!
My preferred method of perventing partial file consumption is actually to upload files in a temp folder, and then move them into the final folder after they are uploaded. The consumer may then process any files it sees, then delete them. But FTP server permissions must allow moving files for this to work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.