5

I'm new to pandas and pretty confused about it especially compared to lists and using list comprehensions.

I have a dataframe with 4 columns. I want to create a 5th column "c" based on 4th column "m". I can get the value for "c" by applying my function for each row in column "m".

If "m" was a list and using list comprehension it would be

c = [myfunction(x) for x in m]

How do I do apply this "logic" to a dataframe?

3
  • Try this: df['c'] = df['m'].apply(myfunction) Commented Feb 16, 2016 at 5:38
  • If you actually need to apply the function separately to each row, it would be df['c'] = df['m'].map(myfunction). But often that is not the best way to go as it doesn't take advantage of pandas' vectorized operations, where lots of operations can be applied very quickly to a whole column at once. If you can include more detail in your post, people can let you know the best way to achieve this. Commented Feb 16, 2016 at 5:40
  • @Marius how can you achieve built-in vectorized pandas behavior? Commented Mar 14, 2018 at 16:34

3 Answers 3

8

You can assign - sample from doc:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})
print df
    A         B
0   1  0.769028
1   2 -0.392471
2   3  0.153051
3   4 -0.379848
4   5 -0.665426
5   6  0.880684
6   7  1.126381
7   8 -0.559828
8   9  0.862935
9  10 -0.909402

df = df.assign(ln_A = lambda x: np.log(x.A))
print df
    A         B      ln_A
0   1  0.769028  0.000000
1   2 -0.392471  0.693147
2   3  0.153051  1.098612
3   4 -0.379848  1.386294
4   5 -0.665426  1.609438
5   6  0.880684  1.791759
6   7  1.126381  1.945910
7   8 -0.559828  2.079442
8   9  0.862935  2.197225
9  10 -0.909402  2.302585

Or apply as Lu Qi commented.

Sometimes lambda function is helpful:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})

df['ln_A'] = df['A'].apply(np.log)
df['round'] = df['B'].apply(lambda x: np.round(x, 2))
print df

    A         B      ln_A  round
0   1 -0.982828  0.000000  -0.98
1   2  2.306111  0.693147   2.31
2   3  0.967858  1.098612   0.97
3   4 -0.286280  1.386294  -0.29
4   5 -2.026937  1.609438  -2.03
5   6  0.061735  1.791759   0.06
6   7 -0.506620  1.945910  -0.51
7   8 -0.309438  2.079442  -0.31
8   9 -1.261842  2.197225  -1.26
9  10  1.079921  2.302585   1.08
Sign up to request clarification or add additional context in comments.

Comments

0

Since pandas is on the top of numpy. You can easily apply a function to a numpy.array. The following example might help. You can transfer a list(or a column) to numpy.array and then do a vector computing.

import numpy as np
import pandas as pd
data = pd.DataFrame([[1,2],[3,4]],columns=['a','b'])
def square(x):
    return x ** 2
data['c'] = square(np.array(data.a))

2 Comments

@jezrael isn't the numpy array operation faster than apply?
@Mike Palmice I agree, comment was removed.
0

The following is analogous to the generic list comprehension case

def some_fn(x):
    # return some_other_fn(x.Colname1, x.Colname2, ...)
    return x.a + x.b

df = pd.DataFrame({'a' : [1, 2], 'b' : [3, 4]})
df['c'] = [some_fn(row) for ind, row in df.iterrows()]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.