I have a table (df1) with a list of values (neig_list, which is a python list) in each row
| ID | neig_list |
|---|---|
| 1 | a, b, d |
| 2 | b, e, f, g, h |
| 3 | b, a, j, k |
And a table (df2) with entries for those values
| neig | samples | samples_indicator |
|---|---|---|
| 'a' | 3 | 0.5 |
| 'a' | 5 | 0.1 |
| 'b' | 1 | 0.2 |
| 'c' | 15 | 0.5 |
| 'd' | 12 | 0.3 |
| 'a' | 2 | 1 |
| 'e' | 5 | 0.6 |
| 'f' | 6 | 0 |
| 'h' | 6 | 0.5 |
I need to add a column to df1 getting, for each row, the result for the sum of samples x samples_indicator for all neigs that are contained in the neig_list for that row.
For example, for the first row, we would have:
3*0.5 + 5*0.1 + 1*0.2 + 12*0.3 + 2*1 = 7.8
| ID | neig_list | new_column |
|---|---|---|
| 1 | a, b, d | 7.8 |
| 2 | b, e, f, g, h | value |
| 3 | b, a, j, k | value |
Actually, the function is more complicated than that (involves more columns), so ideally I'd like to have a separate function and then apply it to df1, based on df2.