0

I have two csv files file1.csv

col1,col2,col3
1,2,3
4,5,6
7,8,9

file2.csv

col1,col2,col3
0,2,3
4,0,6
7,8,9

I want to compare these two files column wise output the result to another file. file3.csv

col1,col2
1,0
0,5
0,0

The code i tried,

import csv
with open('file1.csv', 'r') as t1:
    old_csv = t1.readlines()
with open('file2.csv', 'r') as t2:
    new_csv = t2.readlines()

with open('file3.csv', 'w') as out_file:
    line_in_new = 1
    line_in_old = 1
    leng=len(new_csv)
    out_file.write(new_csv[0])
    while line_in_new < len(new_csv) and line_in_old < len(old_csv):
        if (old_csv[line_in_old]) != (new_csv[line_in_new]):
            out_file.write(new_csv[line_in_new])
        else:
            line_in_old += 1
        line_in_new += 1

this is a little altered version from one of the answers here in stackoverflow. How can i achieve a column wise comparision.

Thanks in advance!

2
  • how did you arrive at the file 3 output? What kind of comparison are you doing? Commented Jan 20, 2021 at 10:42
  • What do you mean by: I want to compare these two files column wise output the result to another file. file3.csv? Commented Jan 20, 2021 at 10:47

3 Answers 3

2

If you have read your lines you can do the following:

for i in range(min(len(old_csv), len(new_csv))):
    for new_value,old_value in zip(new_csv[i].split(","), old_csv[i].split(",")): # you can add slicing here ([start:stop]) to only select certain columns
        # write whatever you want to the new file e.g.:
        new_file.write(str(int(new_value) - int(old_value)))

I hope that answers your question.

Sign up to request clarification or add additional context in comments.

Comments

0

If the number of columns and rows are the same in the both CSV files you can use pandas to quickly get the difference.

import pandas as pd

df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

diff = df1 - df2
diff.to_csv('file3.csv', index=False)

The file3.csv contents will look like:

col1,col2,col3
1,0,0
0,5,0
0,0,0

Comments

0

Answer from James is correct and should solve your problem. In case you want to avoid few columns like ID_col, string_cols you can try below code. cols is the list of columns you want to calculate difference

import pandas as pd 

cols = ['req_col1','req_col2','req_col3']
df3 = pd.DataFrame(cols )
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

for col in cols:
    df3[col] =  df1[col] -  df2[col]
df3.to_csv('filepath.csv')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.