Suppose I have three data structures:
- A data frame
df1, with columnsA, B, Cof length 10000 - A data frame
df2, with columnsA, some extra misc. columns...of length 8000 - A Python list
labelsof length 8000, where the element at indexicorresponds with rowiindf2.
I'm trying to create a data frame from this information that, for every element in df2.a, I grab the relevant row from df1 and labels to pair up this information. It's possible that an entry in df2.A is NOT present in df1.A.
Currently, I'm doing this through a for i in xrange(len(df2)) loop, checking if df2.A.iloc[i] is present in df1.A, and if it is, I store df1.A, df1.B, df1.C, labels[i] into a dictionary with the first element as the key and the rest of the elements as a list.
Is there a more efficient way to do this and store the outputs df1.A, df1.B, df1.C, labels[i] into a 4 columns dataframe? The for loop is really slow.
Sample data:
df1
A B C
'uid1' 'Bob' 'Rock'
'uid2' 'Jack' 'Pop'
'uid5' 'Cat' 'Country'
...
df2
A
'uid10'
'uid3'
'uid1'
...
labels
[label10, label3, label1, ...]
labelsI want to join with a row ofdf1. But yeah, you understood what I'm trying to do correctly.