comparing two csv files in python that have different data sets

Question

using python, I want to compare two csv files but only compare row2 of the first csv against row0 of the second csv, but print out in a new csv file only the lines where there are no matches for the compared rows.

Example....

currentstudents.csv contains the following information

Susan,Smith,[email protected],8
John,Doe,[email protected],9
Cool,Guy,[email protected],3
Test,User,[email protected],5

previousstudents.csv contains the following information

[email protected]
[email protected]
[email protected]

After comparing the two csv files, a new csv called NewStudents.csv should be written with the following information:

Cool,Guy,[email protected],3

Here is what I have, but this fails to produce what I need....The below code will work, if I omit all data except the email address in the original currentstudents.csv file, but then I dont end up with the needed data in the final csv file.

def newusers():

for line in fileinput.input(r'C:\work\currentstudents.csv', inplace=1):
    print(line.lower(), end='')


with open(r'C:\work\previousstudents.csv', 'r') as t1, open(r'C:\work\currentstudents.csv', 'r') as t2:
    fileone = t1.readlines()
    filetwo = t2.readlines()

with open(r'C:\work\NewStudents.csv', 'w') as outFile:
    for (line[0]) in filetwo:
        if (line[0]) not in fileone:
            outFile.write(line)

Thanks in advance!

Andrej Kesely · Accepted Answer · 2020-08-22 23:18:48Z

1

This script writes NewStudents.csv:

import csv

with open('sample.csv', newline='') as csvfile1, \
     open('sample2.csv', newline='') as csvfile2, \
     open('NewStudents.csv', 'w', newline='') as csvfile3:

    reader1 = csv.reader(csvfile1)
    reader2 = csv.reader(csvfile2)
    csvwriter = csv.writer(csvfile3)
    
    emails = set(row[0] for row in reader2)

    for row in reader1:
        if row[2] not in emails:
            csvwriter.writerow(row)

The content of NewStudents.csv:

Cool,Guy,[email protected],3

answered Aug 22, 2020 at 23:18

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Trenton McKinney · Accepted Answer · 2020-08-23 00:11:29Z

With a pandas option
- For small files it's not going to matter, but for larger files, the vectorized operations of pandas will be significantly faster than iterating through emails (multiple times) with csv.
Read the data with pd.read_csv
Merge the data with pandas.DataFrame.merge
- The columns do not have names in the question, so columns are selected by column index.
Select the desired new students with Boolean indexing with [all_students._merge == 'left_only'].
- .iloc[:, :-2] selects all rows, and all but last two columns.

import pandas as pd

# read the two csv files
cs = pd.read_csv('currentstudents.csv', header=None)
ps = pd.read_csv('previousstudents.csv', header=None)

# merge the data
all_students = cs.merge(ps, left_on=2, right_on=0, how='left', indicator=True)

# select only data from left_only
new_students = all_students.iloc[:, :-2][all_students._merge == 'left_only']

# save the data without the index or header
new_students.to_csv('NewStudents.csv', header=False, index=False)

# NewStudents.csv
Cool,Guy,[email protected],3

Collectives™ on Stack Overflow

comparing two csv files in python that have different data sets

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related