I have two data frames. One dataframe (dfA) looks like:
Name gender start_coordinate end_coordinate ID
Peter M 30 150 1
Hugo M 4500 6000 2
Jennie F 300 700 3
The other dataframe (dfB) looks like
Name position string
Peter 89 aa
Jennie 568 bb
Jennie 90 cc
I want to filter data from dfA such that position from dfB falls in the interval of dfA (start coordinate and end coordinate) and names should be same as well. For example, position value of row # 1 of dfB falls in interval specified by row # 1 of dfA and the corresponding name value is also the same therefore, I want this row. In contrast, row # 3 of dfB also falls in interval of row # 1 of dfA but the name value is different therefore, I don't want this record.
The expected out therefore, becomes:
##new_dfA
Name gender start_coordinate end_coordinate ID
Peter M 30 150 1
Jennie F 300 700 3
##new_dfB
Name position string
Peter 89 aa
Jennie 568 bb
In reality, dfB is of size (443068765,10) and dfA is of size (100000,3) therefore, I don't want to use numpy broadcasting because I run into memory error. Is there a way to deal with this problem within pandas framework. Insights will be appreciated.