Python Pandas - Applying function to a list of values for each row

Question

I have a table (df1) with a list of values (neig_list, which is a python list) in each row

ID	neig_list
1	a, b, d
2	b, e, f, g, h
3	b, a, j, k

And a table (df2) with entries for those values

neig	samples	samples_indicator
'a'	3	0.5
'a'	5	0.1
'b'	1	0.2
'c'	15	0.5
'd'	12	0.3
'a'	2	1
'e'	5	0.6
'f'	6	0
'h'	6	0.5

I need to add a column to df1 getting, for each row, the result for the sum of samples x samples_indicator for all neigs that are contained in the neig_list for that row.

For example, for the first row, we would have:

3*0.5 + 5*0.1 + 1*0.2 + 12*0.3 + 2*1 = 7.8

ID	neig_list	new_column
1	a, b, d	7.8
2	b, e, f, g, h	value
3	b, a, j, k	value

Actually, the function is more complicated than that (involves more columns), so ideally I'd like to have a separate function and then apply it to df1, based on df2.

rafaelc · Accepted Answer · 2022-07-26 17:56:22Z

1

Calculate first your math in df2:

map_ = df2.assign(neig = df2['neig'].str.strip("'"), 
                  calculated = lambda df: df['samples'] * df['samples_indicator'])\
          .groupby('neig')['calculated'].sum()

Then, explode your first df, and map the values above for 'a', 'b' etc with the calculated formula. Finally, groupby and sum:

df['new_column'] = df['neig_list'].str.split(', ').explode()\
                                  .map(map_)\
                                  .groupby(level=0)\  
                                  .sum()

   ID      neig_list  new_column
0   1        a, b, d         7.8
1   2  b, e, f, g, h         6.2
2   3     b, a, j, k         4.2

answered Jul 26, 2022 at 17:56

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ignatius Reilly · Accepted Answer · 2022-07-26 18:45:41Z

1

You can just define a function that performs the calculations for a given list of neigs using df2 and then just apply it to neig_list in df1:

def result(row):
    return sum([df2['samples'][item]*df2['samples_indicator'][item] for item in row])

df1['new_column'] = df1['neig_list'].apply(result)

Note that this requires neig to be the index in df2. If it's not, you can do df2.set_index('neig', inplace=True) or, if you don't want to modify d2:

def result(row):
    return sum([df2.set_index('neig')['samples'][item]*df2.set_index('neig')['samples_indicator'][item] for item in row])

and apply it the same way as before.

edited Jul 26, 2022 at 18:45

answered Jul 26, 2022 at 18:06

Ignatius Reilly

1,7524 gold badges9 silver badges18 bronze badges

Comments

constantstranger · Accepted Answer · 2022-07-26 18:55:15Z

Here's a way to do what your question asks:

def foo(df1, df2):
    return (df1
        .join(df1.assign(neig=df1.neig_list).explode('neig')
            .join(
                df2.assign(new_column=df2.samples * df2.samples_indicator)[['neig','new_column']].groupby('neig').sum(), 
                on='neig')
                .drop(columns=['neig','neig_list']).groupby('ID').sum(), 
            on='ID')
        )
print(foo(df1, df2))

Output:

   ID        neig_list  new_column
0   1        [a, b, d]         7.8
1   2  [b, e, f, g, h]         6.2
2   3     [b, a, j, k]         4.2

Explanation:

use assign() to add new_column as a column to df2 which, using groupby() and sum(), gets populated with the dot-product of samples and samples_indicator for the rows in each neig group
use assign() to clone the neig_list column of df1 as neig and explode() to expand each row to one row per item in the neig column
use join() on the above two DataFrame objects to put sample results from new_column into each row based on its neig value
use join() again with the above DataFrame object (after dropping the neig and neig_list columns) to add the desired column to the original df1.

Collectives™ on Stack Overflow

Python Pandas - Applying function to a list of values for each row

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

neig	samples	samples_indicator
'a'	3	0.5
'a'	5	0.1
'b'	1	0.2
'c'	15	0.5
'd'	12	0.3
'a'	2	1
'e'	5	0.6
'f'	6	0
'h'	6	0.5

neig	samples	samples_indicator
'a'	3	0.5
'a'	5	0.1
'b'	1	0.2
'c'	15	0.5
'd'	12	0.3
'a'	2	1
'e'	5	0.6
'f'	6	0
'h'	6	0.5

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

neig	samples	samples_indicator
'a'	3	0.5
'a'	5	0.1
'b'	1	0.2
'c'	15	0.5
'd'	12	0.3
'a'	2	1
'e'	5	0.6
'f'	6	0
'h'	6	0.5