Python Pandas: Add column based on other column

Question

I'm new to pandas and pretty confused about it especially compared to lists and using list comprehensions.

I have a dataframe with 4 columns. I want to create a 5th column "c" based on 4th column "m". I can get the value for "c" by applying my function for each row in column "m".

If "m" was a list and using list comprehension it would be

c = [myfunction(x) for x in m]

How do I do apply this "logic" to a dataframe?

If you actually need to apply the function separately to each row, it would be df['c'] = df['m'].map(myfunction). But often that is not the best way to go as it doesn't take advantage of pandas' vectorized operations, where lots of operations can be applied very quickly to a whole column at once. If you can include more detail in your post, people can let you know the best way to achieve this. — Marius
– Marius, Commented Feb 16, 2016 at 5:40
@Marius how can you achieve built-in vectorized pandas behavior? — 3pitt
– 3pitt, Commented Mar 14, 2018 at 16:34

Community · Accepted Answer · 2017-05-23 11:50:33Z

You can assign - sample from doc:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})
print df
    A         B
0   1  0.769028
1   2 -0.392471
2   3  0.153051
3   4 -0.379848
4   5 -0.665426
5   6  0.880684
6   7  1.126381
7   8 -0.559828
8   9  0.862935
9  10 -0.909402

df = df.assign(ln_A = lambda x: np.log(x.A))
print df
    A         B      ln_A
0   1  0.769028  0.000000
1   2 -0.392471  0.693147
2   3  0.153051  1.098612
3   4 -0.379848  1.386294
4   5 -0.665426  1.609438
5   6  0.880684  1.791759
6   7  1.126381  1.945910
7   8 -0.559828  2.079442
8   9  0.862935  2.197225
9  10 -0.909402  2.302585

Or apply as Lu Qi commented.

Sometimes lambda function is helpful:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})

df['ln_A'] = df['A'].apply(np.log)
df['round'] = df['B'].apply(lambda x: np.round(x, 2))
print df

    A         B      ln_A  round
0   1 -0.982828  0.000000  -0.98
1   2  2.306111  0.693147   2.31
2   3  0.967858  1.098612   0.97
3   4 -0.286280  1.386294  -0.29
4   5 -2.026937  1.609438  -2.03
5   6  0.061735  1.791759   0.06
6   7 -0.506620  1.945910  -0.51
7   8 -0.309438  2.079442  -0.31
8   9 -1.261842  2.197225  -1.26
9  10  1.079921  2.302585   1.08

MaxGu · Accepted Answer · 2016-02-16 06:59:42Z

0

Since pandas is on the top of numpy. You can easily apply a function to a numpy.array. The following example might help. You can transfer a list(or a column) to numpy.array and then do a vector computing.

import numpy as np
import pandas as pd
data = pd.DataFrame([[1,2],[3,4]],columns=['a','b'])
def square(x):
    return x ** 2
data['c'] = square(np.array(data.a))

edited Feb 16, 2016 at 6:59

answered Feb 16, 2016 at 6:47

MaxGu

714 bronze badges

2 Comments

3pitt Over a year ago

@jezrael isn't the numpy array operation faster than apply?

jezrael Over a year ago

@Mike Palmice I agree, comment was removed.

3pitt · Accepted Answer · 2018-03-14 16:58:54Z

0

The following is analogous to the generic list comprehension case

def some_fn(x):
    # return some_other_fn(x.Colname1, x.Colname2, ...)
    return x.a + x.b

df = pd.DataFrame({'a' : [1, 2], 'b' : [3, 4]})
df['c'] = [some_fn(row) for ind, row in df.iterrows()]

answered Mar 14, 2018 at 16:58

3pitt

9612 gold badges16 silver badges25 bronze badges

Collectives™ on Stack Overflow

Python Pandas: Add column based on other column

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related