1

I have 2 large csv files (both has around a million rows with different column names, there are around 70 columns in a single file). I want to perform a left join(sql like) using python pandas and create a new csv file with the result.

The same operation can be achieved using sql with the below query -

select opportunities.* , data_dump.OpportunityID
 from opportunities 
 left join data_dump on (opportunities.LeadIdentifier=data_dump.LeadId and opportunities.ProductSku=data_dump.ProductName)

I was thinking of doing something like this, but this is very inefficient for this large data-

fetched_opportunities = pd.read_csv(path + "/data_dump.csv").fillna('')
data_obj = fetched_opportunities.to_dict(orient='records')
fetched_opportunities2 = pd.read_csv(path + "/opportunities.csv").fillna('')
data_obj2 = fetched_opportunities2.to_dict(orient='records')
for opportunity_detail2 in data_obj:
    for opportunity_detail1 in data_obj:
        if opportunity_detail2['LeadIdentifier'] == opportunity_detail1['LeadId'] & opportunity_detail2['ProductSku'] == opportunity_detail1['ProductName']:

1 Answer 1

2

Try using merge function as follows:

fetched_opportunities = pd.read_csv(path + "/data_dump.csv").fillna('')
fetched_opportunities2 = pd.read_csv(path + "/opportunities.csv").fillna('')

out=fetched_opportunities[["OpportunityID","LeadId","ProductName"]].merge(fetched_opportunities2,how='left',left_on=['LeadId','ProductName'],right_on=['LeadIdentifier','ProductSku']).drop(["LeadId","ProductName"],axis=1)
Sign up to request clarification or add additional context in comments.

2 Comments

Also, how can I make sure that the new csv has all the columns from fetched_opportunities2 dataframe and only one column (OpportunityId) from fetched_opportunities dataframe, like we will get from this sql -
select opportunities.* , data_dump.OpportunityID from opportunities left join data_dump on (opportunities.LeadIdentifier=data_dump.LeadId and opportunities.ProductSku=data_dump.ProductName)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.