Compare two csv files with python pandas and create a third file with the produced dataframe

Question

I have 2 large csv files (both has around a million rows with different column names, there are around 70 columns in a single file). I want to perform a left join(sql like) using python pandas and create a new csv file with the result.

The same operation can be achieved using sql with the below query -

select opportunities.* , data_dump.OpportunityID
 from opportunities 
 left join data_dump on (opportunities.LeadIdentifier=data_dump.LeadId and opportunities.ProductSku=data_dump.ProductName)

I was thinking of doing something like this, but this is very inefficient for this large data-

fetched_opportunities = pd.read_csv(path + "/data_dump.csv").fillna('')
data_obj = fetched_opportunities.to_dict(orient='records')
fetched_opportunities2 = pd.read_csv(path + "/opportunities.csv").fillna('')
data_obj2 = fetched_opportunities2.to_dict(orient='records')
for opportunity_detail2 in data_obj:
    for opportunity_detail1 in data_obj:
        if opportunity_detail2['LeadIdentifier'] == opportunity_detail1['LeadId'] & opportunity_detail2['ProductSku'] == opportunity_detail1['ProductName']:

Walid · Accepted Answer · 2020-12-31 23:41:36Z

2

Try using merge function as follows:

fetched_opportunities = pd.read_csv(path + "/data_dump.csv").fillna('')
fetched_opportunities2 = pd.read_csv(path + "/opportunities.csv").fillna('')

out=fetched_opportunities[["OpportunityID","LeadId","ProductName"]].merge(fetched_opportunities2,how='left',left_on=['LeadId','ProductName'],right_on=['LeadIdentifier','ProductSku']).drop(["LeadId","ProductName"],axis=1)

edited Dec 31, 2020 at 23:41

answered Dec 31, 2020 at 23:11

Walid

7285 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rishabh Jain Over a year ago

Also, how can I make sure that the new csv has all the columns from fetched_opportunities2 dataframe and only one column (OpportunityId) from fetched_opportunities dataframe, like we will get from this sql -

Rishabh Jain Over a year ago

select opportunities.* , data_dump.OpportunityID from opportunities left join data_dump on (opportunities.LeadIdentifier=data_dump.LeadId and opportunities.ProductSku=data_dump.ProductName)

Collectives™ on Stack Overflow

Compare two csv files with python pandas and create a third file with the produced dataframe

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related