1

I have 2 CSV files with same number of columns and formats containing details about servers in each row. Each file refers to a different Day.

I want to compare each of the servers (rows) of the Day2 CSV file for the Size (GB) column (column D) against each server of the Day1 CSV file for the Size (GB) column (column D), and write the output in either column E of day2 CSV file or in a separate 3rd CSV file to track the difference/growth in size every day.

I am trying to achieve it in Python.

Next I provide an example:

day1.csv

Server  Site      Platform  Size(GB)
a       Primary   Windows   100 
b      Secondary Unix       200 
c       Primary   Oracle    500

day2.csv

Server  Site      Platform  Size(GB)
a       Primary   Windows   150
b       Secondary Unix      100
c       Primary   Oracle    500

Expected Result output.csv

Server  Site      Platform  Size(GB) Growth(GB)
a       Primary   Windows   150      50
b       Secondary Unix      100      -100
c       Primary   Oracle    500      0

EDIT 1:

This is the code I have developed so far:

import csv 
t1 = open('/day1.csv', 'r') 
t2 = open('/day2.csv', 'r') 
outputt=open("/growth.csv","w") 
fileone = t1.readlines() 
filetwo = t2.readlines() 

for line in filetwo: 
    row = row.split(',') 
    a = str(row[0]) 
    b = str(row[1]) 
    c = str(row[2]) 
    d = float(row[3]) 
    f = float(filetwo.row[3] - fileone.row[3])
    outputt.writerow([a,b,c,d,e,f]) 
    outputt.write(line.replace("\n","") + ";6column\n") outputt.close() 
    fileone.close()
4
  • Although the question is pretty complete I suggest you to provide your current Python code to solve this problem. This will allow us to help you further! Commented Sep 4, 2017 at 12:02
  • @CristianRamon-Cortes Please find above the code above. this is my draft so far Commented Sep 4, 2017 at 13:18
  • import csv t1 = open('/day1.csv', 'r') t2 = open('/day2.csv', 'r') outputt=open("/growth.csv","w") fileone = t1.readlines() filetwo = t2.readlines() for line in filetwo: row = row.split(',') a = str(row[0]) b = str(row[1]) c = str(row[2]) d = float(row[3]) f = float(filetwo.row[3] - fileone.row[3]) outputt.writerow([a,b,c,d,e,f]) outputt.write(line.replace("\n","") + ";6column\n") outputt.close() fileone.close() Commented Sep 4, 2017 at 14:13
  • I have added your code from your reply. Please try to edit the question when adding more information so any other person can check it Commented Sep 4, 2017 at 14:46

3 Answers 3

2

It is not a very general solution but I tried to follow your approach as much as possible:

import csv

# Open read files
file1 = open('day1.csv', 'r')
file2 = open('day2.csv', 'r')

# Open output file
outputFile = open ('day3.csv', 'w')
csvWriter = csv.writer(outputFile, delimiter=',')
# Write the output file header
csvWriter.writerow(["Server", "Site", "Platform", "Size", "Growth"])

# Process input files
csvReader1 = csv.reader(file1, delimiter=',')
csvReader2 = csv.reader(file2, delimiter=',')

# Skip headers
csvReader1.next()
csvReader2.next()

# Process data
for rowF2 in csvReader2:
    # Get the content of each line in F1
    rowF1 = csvReader1.next()

    # Uncomment for debug
    #print rowF1
    #print rowF2

    # Construct output line from F2 values
    colA = str(rowF2[0])
    colB = str(rowF2[1])
    colC = str(rowF2[2])
    # Compute the growth
    colD = str(int(rowF2[3]) - int(rowF1[3]))

    # Write the output file
    csvWriter.writerow([colA, colB, colC, colD])                                                                                     

file1.close()
file2.close()
outputFile.close()

From my point of view the biggest concern was in:

  • You need to use the CSV library (csv reader and writer)
  • You need to skip the headers when required
  • You need to close all the files at the end of the execution
Sign up to request clarification or add additional context in comments.

Comments

0

This could be done using Python's CSV library, and an OrderedDict to maintain the original file order:

from collections import OrderedDict
import csv

with open('day1.csv', 'rb') as f_day1, open('day2.csv', 'rb') as f_day2:
    csv_day1 = csv.reader(f_day1)
    csv_day2 = csv.reader(f_day2)

    header = next(csv_day1) + ['Growth(GB)']
    next(csv_day2)

    day1 = OrderedDict([row[0], [row[1], row[2], int(row[3])]] for row in csv_day1)
    day2 = OrderedDict([row[0], [row[1], row[2], int(row[3])]] for row in csv_day2)

with open('output.csv', 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(header)

    for server, data in day1.items():
        data.append(day2[server][2] - data[2])
        data[2] = day2[server][2]
        csv_output.writerow([server] + data)

Giving you an output CSV file as follows:

Server,Site,Platform,Size(GB),Growth(GB)
a,Primary,Windows,150,50
b,Secondary,Unix,100,-100
c,Primary,Oracle,500,0

Note: Files are automatically closed when with is used.

Tested on Python 2.7.12

3 Comments

Actual i am getting the results with both the above scripts. Thanks for your time, but not getting the minus sign when the capacity has reduced.That was my plan actually to highlight the growth in coloured text based on the value..either positive(RED) or Negative(Green)
I have add a separate growth column. It should now look like your expected outcome.
Thanks for your help Martin. Much appreciated
0
# Show True/False against column containing NaN(Mached data)
print(difference.isnull().any())

# Count of NaN(Mached data) in each column
print(difference.isnull().sum())

# Count of Mismatched Data in each column
print(difference.count())

# Difference in records from 2 csv loaded in dataframe df
df = difference.dropna(axis=0,how='all') 

# OutputFile to be saved as 'output_file'.
df.to_csv(output_file)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.