adding to a column in a df in pandas python

Question

I have a csv file that I am turning into a pandas dataframe. One of the columns a is mostly filled with numbers and occasionally filled with zeros. I would like to make a new column e that is the number in column a+500, EXCEPT if there is a 0 in that row for the a column. In those cases it should just stay 0. Or I guess it would also work if column e was a+500, and then all of the cases in e that were just 500 were turned into zero. Any help would be great, I'm new to using pandas/python.

Hi! Welcome to Stack Overflow! Your question is too broad as is. Please read How to Ask and start trying yourself. Once you get stuck, ask here providing a minimal reproducible example! Thank you! — jkalden
– jkalden, Commented Jan 22, 2016 at 13:38
I disagree with @jkalden as the question is rather specific but the title is very misleading. Please edit your title to something more specific like "adding value to a column in a df in pandas python depending on another columns value". I agree you should add a minimal example that at least shows input and expected output. — Fabian Rost
– Fabian Rost, Commented Jan 22, 2016 at 13:48

DavidK · Accepted Answer · 2016-01-22 13:50:08Z

1

Try this :

df['new_a'] = df['a'].astype('int').map(lambda x: x+500 if x != 0 else 0)

answered Jan 22, 2016 at 13:50

DavidK

2,5743 gold badges26 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mike Müller · Accepted Answer · 2016-01-23 05:36:10Z

Example data frame

>>> df = pd.DataFrame({'a': [100, 0, 200], 'b': [200, 500, 0]})
>>> df
     a    b
0  100  200
1    0  500
2  200    0

Solution

You can use where for fast generation of your column:

>>> df['e'] = df['a'].where(df['a'] == 0, df['a'] + 500)
>>> df
     a    b    e
0  100  200  600
1    0  500    0
2  200    0  700

Performance

For a data frame with three million rows:

n = int(1e6)
df = pd.DataFrame({'a': [100, 0, 200] * n, 'b': [200, 500, 0] * n})

using apply (as suggested in another answer here) is pretty slow:

%timeit df['new_a'] = df['a'].astype('int').map(lambda x: x+500 if x != 0 else 0)
1 loops, best of 3: 2.5 s per loop

compared to using where():

%timeit df['e'] = df['a'].where(df['a'] == 0, df['a'] + 500)
10 loops, best of 3: 90.9 ms per loop

It is about 28 times faster.

Fabian Rost · Accepted Answer · 2016-01-22 13:44:51Z

0

I'd propose to write a function and use pd.apply like that:

import pandas as pd
df = pd.DataFrame({'a': [0, 1]})
def add500ifnot0(c):
    if c == 0:
        return c
    else:
        return c + 500
df['e'] = df['a'].apply(add500ifnot0)
df

answered Jan 22, 2016 at 13:44

Fabian Rost

2,4232 gold badges17 silver badges27 bronze badges

Collectives™ on Stack Overflow

adding to a column in a df in pandas python

3 Answers 3

Comments

Example data frame

Solution

Performance

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Example data frame

Solution

Performance

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related