I have two moderately large ascii files containing data in a fixed format. I need to test if 6 given fields in a line of the first file match (within a given tolerance) six fields on any line of the second file then output a common line to continue processing.
I am currently spliting each line in a file using a fortran style line reader, and generating a list of lists with the correct type for each element in each list. I am storing the lists of lists from both files in memory wihilst I operate on them
The fields I need to compare are all floats and I am currently using the following type of flow:
tol = 0.01
for entry1 in file1List:
for entry2 in file2List:
if (abs(entry1[1] - entry2[1]) < tol and abs(entry1[2] - entry2[2]) < tol
and abs(entry1[3] - entry2[3]) < tol and abs(entry1[4] - entry2[4]) < tol
and abs(entry1[5] - entry2[5]) < tol and abs(entry1[6] - entry2[6]) < tol):
print entry1,entry2
The execution of this is fine over a file containing only a small number of lines, but over 30000 lines the execution of this part alone is over 1 min!
I am fairly certain there must be a much faster comparison method but I am struggling to find it, any help would be appreciated.
numpy? That could probably speed things up a bit.