0

using python, I want to compare two csv files but only compare row2 of the first csv against row0 of the second csv, but print out in a new csv file only the lines where there are no matches for the compared rows.

Example....

currentstudents.csv contains the following information

Susan,Smith,[email protected],8
John,Doe,[email protected],9
Cool,Guy,[email protected],3
Test,User,[email protected],5

previousstudents.csv contains the following information

[email protected]
[email protected]
[email protected]

After comparing the two csv files, a new csv called NewStudents.csv should be written with the following information:

Cool,Guy,[email protected],3

Here is what I have, but this fails to produce what I need....The below code will work, if I omit all data except the email address in the original currentstudents.csv file, but then I dont end up with the needed data in the final csv file.

def newusers():

for line in fileinput.input(r'C:\work\currentstudents.csv', inplace=1):
    print(line.lower(), end='')


with open(r'C:\work\previousstudents.csv', 'r') as t1, open(r'C:\work\currentstudents.csv', 'r') as t2:
    fileone = t1.readlines()
    filetwo = t2.readlines()

with open(r'C:\work\NewStudents.csv', 'w') as outFile:
    for (line[0]) in filetwo:
        if (line[0]) not in fileone:
            outFile.write(line)

Thanks in advance!

0

2 Answers 2

1

This script writes NewStudents.csv:

import csv

with open('sample.csv', newline='') as csvfile1, \
     open('sample2.csv', newline='') as csvfile2, \
     open('NewStudents.csv', 'w', newline='') as csvfile3:

    reader1 = csv.reader(csvfile1)
    reader2 = csv.reader(csvfile2)
    csvwriter = csv.writer(csvfile3)
    
    emails = set(row[0] for row in reader2)

    for row in reader1:
        if row[2] not in emails:
            csvwriter.writerow(row)

The content of NewStudents.csv:

Cool,Guy,[email protected],3
Sign up to request clarification or add additional context in comments.

Comments

0
  • With a pandas option
    • For small files it's not going to matter, but for larger files, the vectorized operations of pandas will be significantly faster than iterating through emails (multiple times) with csv.
  • Read the data with pd.read_csv
  • Merge the data with pandas.DataFrame.merge
    • The columns do not have names in the question, so columns are selected by column index.
  • Select the desired new students with Boolean indexing with [all_students._merge == 'left_only'].
import pandas as pd

# read the two csv files
cs = pd.read_csv('currentstudents.csv', header=None)
ps = pd.read_csv('previousstudents.csv', header=None)

# merge the data
all_students = cs.merge(ps, left_on=2, right_on=0, how='left', indicator=True)

# select only data from left_only
new_students = all_students.iloc[:, :-2][all_students._merge == 'left_only']

# save the data without the index or header
new_students.to_csv('NewStudents.csv', header=False, index=False)

# NewStudents.csv
Cool,Guy,[email protected],3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.