For every row in df_a, I am looking to find rows in df_b where the id's are the same and the df_a row's location falls within the df_b row's start and end location.
df_a looks like:
|---------------------|------------------|------------------|
| Name | id | location |
|---------------------|------------------|------------------|
| a | 1 | 202013 |
|---------------------|------------------|------------------|
df_b looks like:
|---------------------|------------------|------------------|------------------|
| Name | id | location_start | location_end |
|---------------------|------------------|------------------|------------------|
| x | 1 | 202010 | 2020199 |
|---------------------|------------------|------------------|------------------|
Unfortunately, df_a and df_b are both nearly a million rows. This code is taking like 10 hours to run on my local. Currently I'm running the following:
for index,row in df_a.iterrows():
matched = df_b[(df_b['location_start']<row['location'])
& (df_b['location_end']>row['location'])
& (df_b['id']==row['id'])]
Is there any obvious way to speed this up?