0

I'm obviously doing something very wrong. I'd like to find files, that are in one directory but not in second directory (for instance xxx.phn in one directory and xxx.wav in second directory...

IT seems that I cannot detect, when file is NOT present in second directory (it's always showing like all files are)... I don't get any file displayed, although they exist...

import shutil, random, os, sys

if len(sys.argv) < 4:
    print """usage: python del_orphans_dir1_dir2.py source_folder source_ext dest_folder dest_ext
             """
    sys.exit(-1)

folder = sys.argv[1]
ext  = sys.argv[2]
dest_folder = sys.argv[3]
dest_ext  = sys.argv[4]
i = 0

for d, ds, fs in os.walk(folder):
    for fname in fs:
        basename = os.path.splitext(fname)[0]
        if (not os.path.exists(dest_folder+'/'+basename + '.' + dest_ext) ):
            print str(i)+': No duplicate for: '+fname
            i=i+1      

print str(i)+' files found'
6
  • FWIW There's no need to do the str(i) call. Just do print i, " files found". And i+=1 rather than i=i+1 works. Commented Jan 15, 2015 at 0:29
  • 3
    If os.path.exists doesn't work correctly, why do you need more than one or two lines to demonstrate same? Having the extra code just creates other places (not related to os.path.exists) the bug could hide. Commented Jan 15, 2015 at 0:32
  • 1
    Similarly, recalculating the full name once for your print statement and again for the actual exists() call means that there could be a subtle difference between them. Ideally, a question like this would show ls -l on the output from the print showing the file to exist, and an error message from the script showing it not to, and would use a variable assigned only once for both the print call and the exists() call, to avoid any chance of such bugs. Commented Jan 15, 2015 at 0:34
  • I have unequal number of files in two directories. And still, script repors '0 files found'. I can remove that print but this doesn't solve the problem... Commented Jan 15, 2015 at 0:40
  • @user2064070. Please edit your question to show the output from print sys.argv[1:], and also for x in os.walk(path): print x for both folder and dest_folder. (You might want to try setting up some test folders with only a few files in before doing this, though). Commented Jan 15, 2015 at 1:19

3 Answers 3

1

Can I suggest that you make the filename you're looking at checking and print it before checking whether it exists..

dest_fname = dest_folder+'/'+basename + '.' + dest_ext
print "dest exists? %s" % dest_fname
os.path.exists(dest_fname)

Also as an aside please join paths using the join() method. (If you really want the basename without the leading path elements there's a basename() function).

Sign up to request clarification or add additional context in comments.

2 Comments

No, os.walk does not work that way: fname will always be a filename, not a path. And in any case, the OPs code is testing whether the file doesn't exist, but that is evaluating to False for every file. That is why the final output is '0 files found' (i.e. no duplicates were found).
True.. I'll edit out the misleading stuff. I still think he should print the paths he's looking for. Probably more of a comment than an answer now..
0

I tried your program out and it worked for two simple flat directories. Here are the directory contents:

a\a.txt
a\b.txt      # Missing from b directory
a\c.txt
b\a.csv
b\c.csv

And result of your script with a txt b csv as parameters. If your result was different, maybe you used different parameters?

0: No duplicate for: b.txt
1 files found

But when I added subdirectories:

a\a.txt
a\b.txt      # Missing from b directory
a\c.txt
a\c\d.txt
a\c\e.txt    # Missing from b\c directory
b\a.csv
b\c.csv
b\c\d.csv

Your script gives:

0: No duplicate for: b.txt
1: No duplicate for: d.txt      # Error here
2: No duplicate for: e.txt
3 files found

To work with sub-directories you need to compute the path relative to the source directory, and then add it to the destination directory. Here's the result with a few other minor cleanups and prints to see what is going on. Note that fname is always just the file name and needs to be joined with d to get the whole path:

#!python2
import os, sys

if len(sys.argv) < 4:
    print """usage: python del_orphans_dir1_dir2.py source_folder source_ext dest_folder dest_ext
             """
    sys.exit(-1)

folder = sys.argv[1]
ext  = sys.argv[2]
dest_folder = sys.argv[3]
dest_ext  = sys.argv[4]
i = 0

for d, ds, fs in os.walk(folder):
    for fname in fs:
        relpath = os.path.relpath(os.path.join(d,fname),folder)
        relbase = os.path.splitext(relpath)[0]
        path_to_check = os.path.join(dest_folder,relbase+'.'+dest_ext)
        if not os.path.exists(path_to_check):
            print '{}: No duplicate for: {}, {} not found.'.format(i,os.path.join(folder,relpath),path_to_check)
            i += 1

print i,'files found'

Output:

0: No duplicate for: a\b.txt, b\b.csv not found.
1: No duplicate for: a\c\e.txt, b\c\e.csv not found.
2 files found

Comments

0

What you're doing is looking for are matching files, not duplicate ones. One problem is that you're not using use the source_ext argument when searching. Another is I think the command-line argument handling is messed-up. Here's a corrected version that accomplishes what you're trying to do:

import os
import sys

if len(sys.argv) != 5:
    print("usage: python "
          "del_orphans_dir1_dir2.py "  # argv[0] (script name)
          "source_folder "             # argv[1]
          "source_ext "                # argv[2]
          "dest_folder "               # argv[3]
          "dest_ext")                  # argv[4]
    sys.exit(2)  # command line error

source_folder, source_ext, dest_folder, dest_ext = sys.argv[1:6]
dest_ext = dest_ext if dest_ext.startswith('.') else '.'+dest_ext  # check dot

found = 0
for d, ds, fs in os.walk(source_folder):
    for i, fname in enumerate(fs, start=1):
        basename, ext = os.path.splitext(fname)
        if ext == source_ext:
           if os.path.exists(os.path.join(dest_folder, basename+dest_ext)):
                found += 1
           else:
                print '{}: No matching file found for: {}'.format(i, fname)

print '{} matches found'.format(found)
sys.exit(0)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.