142

I would like to know if there is someway of replacing all DataFrame negative numbers by zeros?

1
  • Also, I believe your second line should read num[num < 0] = 0 Commented Feb 18, 2015 at 23:03

9 Answers 9

155

If all your columns are numeric, you can use boolean indexing:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})

In [3]: df
Out[3]: 
   a  b
0  0 -3
1 -1  2
2  2  1

In [4]: df[df < 0] = 0

In [5]: df
Out[5]: 
   a  b
0  0  0
1  0  2
2  2  1

For the more general case, this answer shows the private method _get_numeric_data:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],
                           'c': ['foo', 'goo', 'bar']})

In [3]: df
Out[3]: 
   a  b    c
0  0 -3  foo
1 -1  2  goo
2  2  1  bar

In [4]: num = df._get_numeric_data()

In [5]: num[num < 0] = 0

In [6]: df
Out[6]: 
   a  b    c
0  0  0  foo
1  0  2  goo
2  2  1  bar

With timedelta type, boolean indexing seems to work on separate columns, but not on the whole dataframe. So you can do:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
   ...:                    'b': pd.to_timedelta([-3, 2, 1], 'd')})

In [3]: df
Out[3]: 
        a       b
0  0 days -3 days
1 -1 days  2 days
2  2 days  1 days

In [4]: for k, v in df.iteritems():
   ...:     v[v < 0] = 0
   ...:     

In [5]: df
Out[5]: 
       a      b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days

Update: comparison with a pd.Timedelta works on the whole DataFrame:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
   ...:                    'b': pd.to_timedelta([-3, 2, 1], 'd')})

In [3]: df[df < pd.Timedelta(0)] = 0

In [4]: df
Out[4]: 
       a      b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days
Sign up to request clarification or add additional context in comments.

Comments

146

Another succinct way of doing this is pandas.DataFrame.clip.

For example:

import pandas as pd

In [20]: df = pd.DataFrame({'a': [-1, 100, -2]})

In [21]: df
Out[21]: 
     a
0   -1
1  100
2   -2

In [22]: df.clip(lower=0)
Out[22]: 
     a
0    0
1  100
2    0

6 Comments

This is the inline solution I was looking for! Thanks!
If you only want to apply clip on a specific column you can go like df['col_name'] = df['col_name'].clip(lower=0)
clip_lower has been deprecated so rather stick to df.clip(lower=0)
This seems to be the fastest method
@DomingoR said this is inline, but this doesn't edit the values inplace, you need to assign to the same column to have the clip work.
|
23

Another clean option that I have found useful is pandas.DataFrame.mask which will "replace values where the condition is true."

Create the DataFrame:

In [2]: import pandas as pd

In [3]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})

In [4]: df
Out[4]: 
   a  b
0  0 -3
1 -1  2
2  2  1

Replace negative numbers with 0:

In [5]: df.mask(df < 0, 0)
Out[5]: 
   a  b
0  0  0
1  0  2
2  2  1

Or, replace negative numbers with NaN, which I frequently need:

In [7]: df.mask(df < 0)
Out[7]: 
     a    b
0  0.0  NaN
1  NaN  2.0
2  2.0  1.0

1 Comment

.mask() is as KISS as it gets!
15

With lambda function

df['column'] = df['column'].apply(lambda x : x if x > 0 else 0)

Comments

14

Perhaps you could use pandas.where(args) like so:

data_frame = data_frame.where(data_frame < 0, 0)

Comments

2

If you are dealing with a large df (40m x 700 in my case) it works much faster and memory savvy through iteration on columns with something like.

for col in df.columns:
    df[col][df[col] < 0] = 0

2 Comments

You get a A value is trying to be set on a copy of a slice from a DataFrame warning when you do this
Perhaps using .copy() will avoid it
1

A slight modification of the answers present.

Let's identify all the numeric columns and create a dataframe with all numeric values. Then replace the negative values with NaN in new dataframe

df_numeric = df.select_dtypes(include=[np.number])
df_numeric = df_numeric.where(lambda x: x > 0, np.nan)

Now, drop the columns where negative values are handled in the main data frame and then concatenate the new column values to the main data frame

numeric_cols = df_numeric.columns.values
df = df.drop(columns=numeric_cols)
df = pd.concat([df, df_numeric], axis = 1)

Comments

0

If you have a dataset of mixed data types, also consider moving the non-numerics to the index, updating the data, then removing the index:

df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],
                       'c': ['foo', 'goo', 'bar']})
df = df.set_index('c')
df[df < 0] = 0
df = df.reset_index()

The approach using _get_numeric_data() didn't work for me for some reason.

Comments

0

Try this:

df.loc[(df < 0).index, :] = 0

To avoid getting a unpredicatable behavior: Returning a view versus a copy

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.