0

I want to compare 2 csv files and output the changes in a list or set using python pandas without iteration.

file1.csv

  • Alex
  • Johnny
  • Mark
  • Steve
  • Raul

file2.csv

  • Alex
  • Mark

I want the output to be something like p.s: Csv files contain the names as rows (Every row 1 name)

difference = [ Johnny, Steve, Raul] or {'Johnny','Steve','Raul'}

2
  • are the names unique ? if so you could use set difference from numpy for instance Commented Feb 15, 2022 at 16:22
  • Yes they are unique but can you help me in code ? The row number is a bit large around 50k rows Commented Feb 15, 2022 at 16:24

1 Answer 1

1

The numpy setdiff1d() function is used to find the set difference of two arrays.

import pandas as pd
import numpy as np
file1 = r'C:\Users\user\Documents\datasets\file1.csv' #path to file1
file2 = r'C:\Users\user\Documents\datasets\file2.csv' #path to file2
df1 = pd.read_csv(file1) #convert csv to dataframe
df2 = pd.read_csv(file2) #convert csv to dataframe
diff = np.setdiff1d(df1, df2, assume_unique=False)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.