4

I am trying to print out the differences by comparing a column between 2 csv files.

CSV1:

SERVER,   FQDN,   IP_ADDRESS,  
serverA, device1.com, 10.10.10.1  
serverA,device2.com,10.11.11.1  
serverC,device3.com,10.12.12.1   
and so on..

CSV2:

FQDN, IP_ADDRESS, SERVER,  LOCATION  
device3.com,10.12.12.1,serverC,xx  
device679.com,20.3.67.1,serverA,we  
device1.com,10.10.10.1,serverA,ac  
device345.com,192.168.2.0,serverA,ad  
device2.com,192.168.6.0,serverB,af  
and so on...

What I am looking to do is to compare the FQDN column and write the differences to a new csv output file. So my output would look something like this:

Output.csv:

FQDN, IP_ADDRESS, SERVER, LOCATION  
device679.com,20.3.67.1,serverA,we  
device345.com,192.168.2.0,serverA,ad  
and so on..

I have tried, but not able to get the output.

This is my Code, please tell me where i am going wrong;

import csv

data = {}  # creating list to store the data

with open('CSV1.csv', 'r') as lookuplist:
 reader1 = csv.reader(lookuplist)
 for col in reader1:
    DATA[col[0]] = col[1]

with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
 reader2 = csv.reader(csvinput)
 csv_output = csv.writer(f_output)
 fieldnames = (['FQDN', 'IP_ADDRESS', 'SERVER'])
 csv_output.writerow(fieldnames)  # prints header to the output file

    for col in reader1:
     if col[1] not in reader2:
        csv_output.writerow(col)  

(EDIT) This is another approach that I have used:

import csv

f1 = (open("CSV1.csv"))
f2 = (open("CSV2.csv"))

csv_f1 = csv.reader(f1)
csv_f2 = csv.reader(f2)

for col1, col2 in zip(csv_f1, csv_f2):
    if col2[0] not in col1[1]:
    print(col2[0])

Basically, here I am only trying to find out first whether the unmatched FQDNs are printed or not. But it is printing out the whole CSV1 column instead. Please help guys, lot of research has went into this, but found no luck yet! :(

6
  • You might want to have a look at the built-in difflib module: docs.python.org/3.6/library/difflib.html Commented Jun 1, 2018 at 21:58
  • @UriGoren, I am fairly new to Python and I'm cracking terms that comes along the way. Can you please help with any other method where we can compare both of these column and write it to a csv file please? I have been working on this code but found no luck yet. Commented Jun 1, 2018 at 23:05
  • I've added a detailed answer below, If my answer helps you please accept it or at least vote up. thanks. Commented Jun 2, 2018 at 14:59
  • @UriGoren I ran this code, but I am getting the rows which are not there in CSV1, I want specifically the rows from CSV1 which are not there in CSV2. CSV1 file has 405 entries and CSV2 has around 16k, but it is missing entries that are in CSV1 and I need those to write to a file. Commented Jun 2, 2018 at 22:56
  • Please read the documentation of difflib, you can use the dmode variable in my code to specify the kind of rows you want to filter (only in 1, only in 2, common to both). If any of the answers below were helpful - please vote up or accept them. Commented Jun 3, 2018 at 7:30

3 Answers 3

2

This code uses the built-in difflib to spit out the lines from file1.csv that don't appear in file2.csv and vice versa.

I use the Differ object for identifying line changes. I assumed that you would not regard line swapping as a difference, that's why I added the sorted() function call.

from difflib import Differ
csv_file1 = sorted(open("file1.csv", 'r').readlines())
csv_file2 = sorted(open("file2.csv", 'r').readlines())
with open("diff.csv", 'w') as f:
    for line in Differ().compare(csv_file1,csv_file2)):
        dmode, line = line[:2], line[2:]
        if dmode.strip() == "":
            continue
        f.write(line + "\n")

Note that if the line differs somehow (not only in the FQDN column) it would appear in diff.csv

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your help, but I figured out the code by myself. I'll be posting the code after final touches.
0
import csv

data = {}  # creating list to store the data

with open('CSV1.csv', 'r') as lookuplist, open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w') as f_output:
 reader1 = csv.reader(lookuplist)
 reader2 = csv.reader(csvinput)
 csv_output = csv.writer(f_output)
 fieldnames = (['FQDN', 'IP_ADDRESS', 'SERVER', 'LOCATION'])
 csv_output.writerow(fieldnames)  # prints header to the output file
 _tempFqdn = []
 for i,dt in enumerate(reader1):
     if i==0:
         continue
     _tempFqdn.append(dt[1].strip())
 for i,col in enumerate(reader2):
     if i==0:
         continue
     if col[0].strip() not in _tempFqdn:
         csv_output.writerow(col)

Comments

-1
import csv

data = {}  # creating dictionary to store the data

with open('CSV1.csv', 'r') as lookuplist:
reader1 = csv.reader(lookuplist)
for col in reader1:
    data[col[1]] = col[1]  # stores the data from column 0 to column 1 in the data list

with open('CSV2.csv', 'r') as csvinput, open('Output.csv', 'w', newline='') as f_output:
reader2 = csv.reader(csvinput)
csv_output = csv.writer(f_output)
fieldnames = (['SERVER', 'FQDN', 'AUTOMATION_ADMINISTRATOR', 'IP_ADDRESS', 'PRIMARY_1', 'MHT_1', 'MHT_2',
               'MHT_3'])
csv_output.writerow(fieldnames)  # prints header to the output file

for col in reader2:
    if col[0] not in data:  # if the column 1 in CSV1 does not match with column 0 in CSV2 Extract
    col = [col[0]]


        csv_output.writerow(col)  # writes all the data that is matched in CMDB WLC Extract

So basically, I only had to change 'not in' under 'for loop' and change the columns in the data list that will be reading from the CSV1 file that I am creating.

1 Comment

@down voter, why a down vote? I am able to run the script absolutely fine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.