Compare a column between 2 csv files and write differences using Python

Question

I am trying to print out the differences by comparing a column between 2 csv files.

CSV1:

SERVER,   FQDN,   IP_ADDRESS,  
serverA, device1.com, 10.10.10.1  
serverA,device2.com,10.11.11.1  
serverC,device3.com,10.12.12.1   
and so on..

CSV2:

FQDN, IP_ADDRESS, SERVER,  LOCATION  
device3.com,10.12.12.1,serverC,xx  
device679.com,20.3.67.1,serverA,we  
device1.com,10.10.10.1,serverA,ac  
device345.com,192.168.2.0,serverA,ad  
device2.com,192.168.6.0,serverB,af  
and so on...

What I am looking to do is to compare the FQDN column and write the differences to a new csv output file. So my output would look something like this:

Output.csv:

FQDN, IP_ADDRESS, SERVER, LOCATION  
device679.com,20.3.67.1,serverA,we  
device345.com,192.168.2.0,serverA,ad  
and so on..

I have tried, but not able to get the output.

This is my Code, please tell me where i am going wrong;

import csv

data = {}  # creating list to store the data

with open('CSV1.csv', 'r') as lookuplist:
 reader1 = csv.reader(lookuplist)
 for col in reader1:
    DATA[col[0]] = col[1]

with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
 reader2 = csv.reader(csvinput)
 csv_output = csv.writer(f_output)
 fieldnames = (['FQDN', 'IP_ADDRESS', 'SERVER'])
 csv_output.writerow(fieldnames)  # prints header to the output file

    for col in reader1:
     if col[1] not in reader2:
        csv_output.writerow(col)

(EDIT) This is another approach that I have used:

import csv

f1 = (open("CSV1.csv"))
f2 = (open("CSV2.csv"))

csv_f1 = csv.reader(f1)
csv_f2 = csv.reader(f2)

for col1, col2 in zip(csv_f1, csv_f2):
    if col2[0] not in col1[1]:
    print(col2[0])

Basically, here I am only trying to find out first whether the unmatched FQDNs are printed or not. But it is printing out the whole CSV1 column instead. Please help guys, lot of research has went into this, but found no luck yet! :(

You might want to have a look at the built-in difflib module: docs.python.org/3.6/library/difflib.html — Uri Goren
– Uri Goren, Commented Jun 1, 2018 at 21:58
@UriGoren, I am fairly new to Python and I'm cracking terms that comes along the way. Can you please help with any other method where we can compare both of these column and write it to a csv file please? I have been working on this code but found no luck yet. — Karan M
– Karan M, Commented Jun 1, 2018 at 23:05
I've added a detailed answer below, If my answer helps you please accept it or at least vote up. thanks. — Uri Goren
– Uri Goren, Commented Jun 2, 2018 at 14:59
@UriGoren I ran this code, but I am getting the rows which are not there in CSV1, I want specifically the rows from CSV1 which are not there in CSV2. CSV1 file has 405 entries and CSV2 has around 16k, but it is missing entries that are in CSV1 and I need those to write to a file. — Karan M
– Karan M, Commented Jun 2, 2018 at 22:56
Please read the documentation of difflib, you can use the dmode variable in my code to specify the kind of rows you want to filter (only in 1, only in 2, common to both). If any of the answers below were helpful - please vote up or accept them. — Uri Goren
– Uri Goren, Commented Jun 3, 2018 at 7:30

Uri Goren · Accepted Answer · 2018-06-02 12:03:10Z

2

This code uses the built-in difflib to spit out the lines from file1.csv that don't appear in file2.csv and vice versa.

I use the Differ object for identifying line changes. I assumed that you would not regard line swapping as a difference, that's why I added the sorted() function call.

from difflib import Differ
csv_file1 = sorted(open("file1.csv", 'r').readlines())
csv_file2 = sorted(open("file2.csv", 'r').readlines())
with open("diff.csv", 'w') as f:
    for line in Differ().compare(csv_file1,csv_file2)):
        dmode, line = line[:2], line[2:]
        if dmode.strip() == "":
            continue
        f.write(line + "\n")

Note that if the line differs somehow (not only in the FQDN column) it would appear in diff.csv

answered Jun 2, 2018 at 12:03

Uri Goren

13.8k8 gold badges62 silver badges113 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Karan M Over a year ago

Thanks for your help, but I figured out the code by myself. I'll be posting the code after final touches.

Gopal Gautam · Accepted Answer · 2018-06-03 04:25:17Z

0

import csv

data = {}  # creating list to store the data

with open('CSV1.csv', 'r') as lookuplist, open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w') as f_output:
 reader1 = csv.reader(lookuplist)
 reader2 = csv.reader(csvinput)
 csv_output = csv.writer(f_output)
 fieldnames = (['FQDN', 'IP_ADDRESS', 'SERVER', 'LOCATION'])
 csv_output.writerow(fieldnames)  # prints header to the output file
 _tempFqdn = []
 for i,dt in enumerate(reader1):
     if i==0:
         continue
     _tempFqdn.append(dt[1].strip())
 for i,col in enumerate(reader2):
     if i==0:
         continue
     if col[0].strip() not in _tempFqdn:
         csv_output.writerow(col)

answered Jun 3, 2018 at 4:25

Gopal Gautam

3691 gold badge3 silver badges9 bronze badges

Comments

Picachieu · Accepted Answer · 2019-03-30 23:50:53Z

-1

import csv

data = {}  # creating dictionary to store the data

with open('CSV1.csv', 'r') as lookuplist:
reader1 = csv.reader(lookuplist)
for col in reader1:
    data[col[1]] = col[1]  # stores the data from column 0 to column 1 in the data list

with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['SERVER', 'FQDN', 'AUTOMATION_ADMINISTRATOR', 'IP_ADDRESS', 'PRIMARY_1', 'MHT_1', 'MHT_2',
               'MHT_3'])
csv_output.writerow(fieldnames)  # prints header to the output file

for col in reader2:
    if col[0] not in data:  # if the column 1 in CSV1 does not match with column 0 in CSV2 Extract
    col = [col[0]]


        csv_output.writerow(col)  # writes all the data that is matched in CMDB WLC Extract

So basically, I only had to change 'not in' under 'for loop' and change the columns in the data list that will be reading from the CSV1 file that I am creating.

edited Mar 30, 2019 at 23:50

Picachieu

3,81010 gold badges29 silver badges47 bronze badges

answered Jun 4, 2018 at 14:09

Karan M

3093 silver badges18 bronze badges

1 Comment

Karan M Over a year ago

@down voter, why a down vote? I am able to run the script absolutely fine.

Collectives™ on Stack Overflow

Compare a column between 2 csv files and write differences using Python

3 Answers 3

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related