Python noob... please be gentle. In my current program, I have a list of 3 files which may or may not reside in my current directory. If they do reside in my directory, I want to be able to assign them values to be later used in other functions. If the file does not reside in the directory, it should not be assigned values as the file does not exist anyway. The code I have so far is below:
import os, csv
def chkifexists():
files = ['A.csv', 'B.csv', 'C.csv']
for fname in files:
if os.path.isfile(fname):
if fname == "A.csv":
hashcolumn = 7
filepathNum = 5
elif fname == "B.csv":
hashcolumn = 15
filepathNum = 5
elif fname == "C.csv":
hashcolumn = 1
filepathNum = 0
return fname, hashcolumn, filepathNum
def removedupes(infile, outfile, hashcolumn):
fname, hashcolumn, filepathNum = chkifexists()
r1 = file(infile, 'rb')
r2 = csv.reader(r1)
w1 = file(outfile, 'wb')
w2 = csv.writer(w1)
hashes = set()
for row in r2:
if row[hashcolumn] =="":
w2.writerow(row)
hashes.add(row[hashcolumn])
if row[hashcolumn] not in hashes:
w2.writerow(row)
hashes.add(row[hashcolumn])
w1.close()
r1.close()
def bakcount(origfile1, origfile2):
'''This function creates a .bak file of the original and does a row count to determine
the number of rows removed'''
os.rename(origfile1, origfile1+".bak")
count1 = len(open(origfile1+".bak").readlines())
#print count1
os.rename(origfile2, origfile1)
count2 = len(open(origfile1).readlines())
#print count2
print str(count1 - count2) + " duplicate rows removed from " + str(origfile1) +"!"
def CleanAndPrettify():
print "Removing duplicate rows from input files..."
fname, hashcolumn, filepathNum = chkifexists()
removedupes(fname, os.path.splitext(fname)[0] + "2.csv", hashcolumn)
bakcount (fname, os.path.splitext(fname)[0] + "2.csv")
CleanAndPrettify()
The problem I am running into is that the code runs through the list and stops at the first valid file it finds.
I'm not sure if I'm completely thinking of it in the wrong way but I thought I was doing it right.
Current output of this program with A.csv, B.csv, and C.csv present in the same directory:
Removing duplicate rows from input files...
2 duplicate rows removed from A.csv!
The Desired output should be:
Removing duplicate rows from input files...
2 duplicate rows removed from A.csv!
5 duplicate rows removed from B.csv!
8 duplicate rows removed from C.csv!
...and then continue on with the next portion of creating the .bak files. The output of this program without any CSV files in the same directory:
UnboundLocalError: local variable 'hashcolumn' referenced before assignment
chkifexists()as soon as it finds the first occurence. Are you callingchkifexists()multiple times. I am unable to grasp your problem completely.hashcolumnandfilepathNumvalues coming from? What do they mean? Why isn't that information stored in the actual files somehow?