I have two csv files. I am reading this without the csv reader because there are inconsistencies in the lines - some lines have quotations and some do not, and this was throwing off the csv reader. The files are both of the same format, but have different entries So they look something like this:
a b c d e f g h i j h i j k
"a b c d e f g h i j h i j k j"
"a b c d e f g h i j h i j k j"
What I need to do is find all the lines in file 1 and file 2 that have have the same value for the third column (c). Note that the rest of the values will be quite different so I don't think that something like difflib will work, unless I've missed something.
At first I tried using a nested for loop - something like this
for line in fileOne:
entry=line.split()
print ("A")
for row in fileTwo:
space=row.split
print ("B")
if space[2]=entry[2]:
outputHandle.write(line)
but I found using print statements that this was outputting
A
B
B
B
A
A
I need the script to check through all the lines of the second file for each line in the first file so it would look like this:
A
B
B
B
A
B
B
B....etc
(This is very expensive, I know. But I am just staring out, not sure how to do this more efficiently, sadly)
I also tried using a function:
def file_check(variableName):
for row in fileTwo:
return("B")
if entry in row:
return ("found")
return("not found")
for line in fileOne:
entry= line.split()
print ("A")
var=file_check(entry[2])
print (var)
This outputs: A ('Not found') A ('Not found') A ('Not found')
Since I am using test files, I KNOW that there are matching entries and so this is also not looping through the second file, but rather checking only the first line.
Sorry to ask such a basic question, StackOverflowians, but I'm really stuck this time. ANY advice is welcome and appreciated!!!
NOTE: this question HAS been asked before, but the answers only work for Python 2, the csv module for python 3 seems to be really different. Here is the previous version of this question: Comparing two CSV Files Based on Specific Data in two Columns