I have 2 large csv files (both has around a million rows with different column names, there are around 70 columns in a single file). I want to perform a left join(sql like) using python pandas and create a new csv file with the result.
The same operation can be achieved using sql with the below query -
select opportunities.* , data_dump.OpportunityID
from opportunities
left join data_dump on (opportunities.LeadIdentifier=data_dump.LeadId and opportunities.ProductSku=data_dump.ProductName)
I was thinking of doing something like this, but this is very inefficient for this large data-
fetched_opportunities = pd.read_csv(path + "/data_dump.csv").fillna('')
data_obj = fetched_opportunities.to_dict(orient='records')
fetched_opportunities2 = pd.read_csv(path + "/opportunities.csv").fillna('')
data_obj2 = fetched_opportunities2.to_dict(orient='records')
for opportunity_detail2 in data_obj:
for opportunity_detail1 in data_obj:
if opportunity_detail2['LeadIdentifier'] == opportunity_detail1['LeadId'] & opportunity_detail2['ProductSku'] == opportunity_detail1['ProductName']: