Python: Comparing 2 CSV files for Difference 1 column value and output in 3rd csv file

Question

I have 2 CSV files with same number of columns and formats containing details about servers in each row. Each file refers to a different Day.

I want to compare each of the servers (rows) of the Day2 CSV file for the Size (GB) column (column D) against each server of the Day1 CSV file for the Size (GB) column (column D), and write the output in either column E of day2 CSV file or in a separate 3rd CSV file to track the difference/growth in size every day.

I am trying to achieve it in Python.

Next I provide an example:

day1.csv

Server  Site      Platform  Size(GB)
a       Primary   Windows   100 
b      Secondary Unix       200 
c       Primary   Oracle    500

day2.csv

Server  Site      Platform  Size(GB)
a       Primary   Windows   150
b       Secondary Unix      100
c       Primary   Oracle    500

Expected Result output.csv

Server  Site      Platform  Size(GB) Growth(GB)
a       Primary   Windows   150      50
b       Secondary Unix      100      -100
c       Primary   Oracle    500      0

EDIT 1:

This is the code I have developed so far:

import csv 
t1 = open('/day1.csv', 'r') 
t2 = open('/day2.csv', 'r') 
outputt=open("/growth.csv","w") 
fileone = t1.readlines() 
filetwo = t2.readlines() 

for line in filetwo: 
    row = row.split(',') 
    a = str(row[0]) 
    b = str(row[1]) 
    c = str(row[2]) 
    d = float(row[3]) 
    f = float(filetwo.row[3] - fileone.row[3])
    outputt.writerow([a,b,c,d,e,f]) 
    outputt.write(line.replace("\n","") + ";6column\n") outputt.close() 
    fileone.close()

Although the question is pretty complete I suggest you to provide your current Python code to solve this problem. This will allow us to help you further! — Cristian Ramon-Cortes
– Cristian Ramon-Cortes, Commented Sep 4, 2017 at 12:02
@CristianRamon-Cortes Please find above the code above. this is my draft so far — Gokul R
– Gokul R, Commented Sep 4, 2017 at 13:18
import csv t1 = open('/day1.csv', 'r') t2 = open('/day2.csv', 'r') outputt=open("/growth.csv","w") fileone = t1.readlines() filetwo = t2.readlines() for line in filetwo: row = row.split(',') a = str(row[0]) b = str(row[1]) c = str(row[2]) d = float(row[3]) f = float(filetwo.row[3] - fileone.row[3]) outputt.writerow([a,b,c,d,e,f]) outputt.write(line.replace("\n","") + ";6column\n") outputt.close() fileone.close() — Gokul R
– Gokul R, Commented Sep 4, 2017 at 14:13
I have added your code from your reply. Please try to edit the question when adding more information so any other person can check it — Cristian Ramon-Cortes
– Cristian Ramon-Cortes, Commented Sep 4, 2017 at 14:46

Cristian Ramon-Cortes · Accepted Answer · 2017-09-04 15:19:00Z

It is not a very general solution but I tried to follow your approach as much as possible:

import csv

# Open read files
file1 = open('day1.csv', 'r')
file2 = open('day2.csv', 'r')

# Open output file
outputFile = open ('day3.csv', 'w')
csvWriter = csv.writer(outputFile, delimiter=',')
# Write the output file header
csvWriter.writerow(["Server", "Site", "Platform", "Size", "Growth"])

# Process input files
csvReader1 = csv.reader(file1, delimiter=',')
csvReader2 = csv.reader(file2, delimiter=',')

# Skip headers
csvReader1.next()
csvReader2.next()

# Process data
for rowF2 in csvReader2:
    # Get the content of each line in F1
    rowF1 = csvReader1.next()

    # Uncomment for debug
    #print rowF1
    #print rowF2

    # Construct output line from F2 values
    colA = str(rowF2[0])
    colB = str(rowF2[1])
    colC = str(rowF2[2])
    # Compute the growth
    colD = str(int(rowF2[3]) - int(rowF1[3]))

    # Write the output file
    csvWriter.writerow([colA, colB, colC, colD])                                                                                     

file1.close()
file2.close()
outputFile.close()

From my point of view the biggest concern was in:

You need to use the CSV library (csv reader and writer)
You need to skip the headers when required
You need to close all the files at the end of the execution

Martin Evans · Accepted Answer · 2017-10-02 10:59:02Z

0

This could be done using Python's CSV library, and an OrderedDict to maintain the original file order:

from collections import OrderedDict
import csv

with open('day1.csv', 'rb') as f_day1, open('day2.csv', 'rb') as f_day2:
    csv_day1 = csv.reader(f_day1)
    csv_day2 = csv.reader(f_day2)

    header = next(csv_day1) + ['Growth(GB)']
    next(csv_day2)

    day1 = OrderedDict([row[0], [row[1], row[2], int(row[3])]] for row in csv_day1)
    day2 = OrderedDict([row[0], [row[1], row[2], int(row[3])]] for row in csv_day2)

with open('output.csv', 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(header)

    for server, data in day1.items():
        data.append(day2[server][2] - data[2])
        data[2] = day2[server][2]
        csv_output.writerow([server] + data)

Giving you an output CSV file as follows:

Server,Site,Platform,Size(GB),Growth(GB)
a,Primary,Windows,150,50
b,Secondary,Unix,100,-100
c,Primary,Oracle,500,0

Note: Files are automatically closed when with is used.

Tested on Python 2.7.12

edited Oct 2, 2017 at 10:59

answered Sep 4, 2017 at 15:05

Martin Evans

46.9k17 gold badges88 silver badges104 bronze badges

3 Comments

Gokul R Over a year ago

Actual i am getting the results with both the above scripts. Thanks for your time, but not getting the minus sign when the capacity has reduced.That was my plan actually to highlight the growth in coloured text based on the value..either positive(RED) or Negative(Green)

Martin Evans Over a year ago

I have add a separate growth column. It should now look like your expected outcome.

Gokul R Over a year ago

Thanks for your help Martin. Much appreciated

Skandix · Accepted Answer · 2018-03-20 19:22:26Z

0

# Show True/False against column containing NaN(Mached data)
print(difference.isnull().any())

# Count of NaN(Mached data) in each column
print(difference.isnull().sum())

# Count of Mismatched Data in each column
print(difference.count())

# Difference in records from 2 csv loaded in dataframe df
df = difference.dropna(axis=0,how='all') 

# OutputFile to be saved as 'output_file'.
df.to_csv(output_file)

edited Mar 20, 2018 at 19:22

Skandix

2,0206 gold badges31 silver badges38 bronze badges

answered Mar 20, 2018 at 17:48

Avinash Singh

732 silver badges13 bronze badges

Collectives™ on Stack Overflow

Python: Comparing 2 CSV files for Difference 1 column value and output in 3rd csv file

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related