2

I have a csv file that I am turning into a pandas dataframe. One of the columns a is mostly filled with numbers and occasionally filled with zeros. I would like to make a new column e that is the number in column a+500, EXCEPT if there is a 0 in that row for the a column. In those cases it should just stay 0. Or I guess it would also work if column e was a+500, and then all of the cases in e that were just 500 were turned into zero. Any help would be great, I'm new to using pandas/python.

2
  • 1
    Hi! Welcome to Stack Overflow! Your question is too broad as is. Please read How to Ask and start trying yourself. Once you get stuck, ask here providing a minimal reproducible example! Thank you! Commented Jan 22, 2016 at 13:38
  • I disagree with @jkalden as the question is rather specific but the title is very misleading. Please edit your title to something more specific like "adding value to a column in a df in pandas python depending on another columns value". I agree you should add a minimal example that at least shows input and expected output. Commented Jan 22, 2016 at 13:48

3 Answers 3

1

Try this :

df['new_a'] = df['a'].astype('int').map(lambda x: x+500 if x != 0 else 0)
Sign up to request clarification or add additional context in comments.

Comments

1

Example data frame

>>> df = pd.DataFrame({'a': [100, 0, 200], 'b': [200, 500, 0]})
>>> df
     a    b
0  100  200
1    0  500
2  200    0

Solution

You can use where for fast generation of your column:

>>> df['e'] = df['a'].where(df['a'] == 0, df['a'] + 500)
>>> df
     a    b    e
0  100  200  600
1    0  500    0
2  200    0  700

Performance

For a data frame with three million rows:

n = int(1e6)
df = pd.DataFrame({'a': [100, 0, 200] * n, 'b': [200, 500, 0] * n})

using apply (as suggested in another answer here) is pretty slow:

%timeit df['new_a'] = df['a'].astype('int').map(lambda x: x+500 if x != 0 else 0)
1 loops, best of 3: 2.5 s per loop

compared to using where():

%timeit df['e'] = df['a'].where(df['a'] == 0, df['a'] + 500)
10 loops, best of 3: 90.9 ms per loop

It is about 28 times faster.

Comments

0

I'd propose to write a function and use pd.apply like that:

import pandas as pd
df = pd.DataFrame({'a': [0, 1]})
def add500ifnot0(c):
    if c == 0:
        return c
    else:
        return c + 500
df['e'] = df['a'].apply(add500ifnot0)
df

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.