How to replace negative numbers in Pandas Data Frame by zero

Question

I would like to know if there is someway of replacing all DataFrame negative numbers by zeros?

Also, I believe your second line should read num[num < 0] = 0 — hlin117
– hlin117, Commented Feb 18, 2015 at 23:03

Community · Accepted Answer · 2017-05-23 11:33:13Z

If all your columns are numeric, you can use boolean indexing:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})

In [3]: df
Out[3]: 
   a  b
0  0 -3
1 -1  2
2  2  1

In [4]: df[df < 0] = 0

In [5]: df
Out[5]: 
   a  b
0  0  0
1  0  2
2  2  1

For the more general case, this answer shows the private method _get_numeric_data:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],
                           'c': ['foo', 'goo', 'bar']})

In [3]: df
Out[3]: 
   a  b    c
0  0 -3  foo
1 -1  2  goo
2  2  1  bar

In [4]: num = df._get_numeric_data()

In [5]: num[num < 0] = 0

In [6]: df
Out[6]: 
   a  b    c
0  0  0  foo
1  0  2  goo
2  2  1  bar

With timedelta type, boolean indexing seems to work on separate columns, but not on the whole dataframe. So you can do:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
   ...:                    'b': pd.to_timedelta([-3, 2, 1], 'd')})

In [3]: df
Out[3]: 
        a       b
0  0 days -3 days
1 -1 days  2 days
2  2 days  1 days

In [4]: for k, v in df.iteritems():
   ...:     v[v < 0] = 0
   ...:     

In [5]: df
Out[5]: 
       a      b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days

Update: comparison with a pd.Timedelta works on the whole DataFrame:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
   ...:                    'b': pd.to_timedelta([-3, 2, 1], 'd')})

In [3]: df[df < pd.Timedelta(0)] = 0

In [4]: df
Out[4]: 
       a      b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days

Stefan · Accepted Answer · 2022-11-07 10:11:39Z

146

Another succinct way of doing this is pandas.DataFrame.clip.

For example:

import pandas as pd

In [20]: df = pd.DataFrame({'a': [-1, 100, -2]})

In [21]: df
Out[21]: 
     a
0   -1
1  100
2   -2

In [22]: df.clip(lower=0)
Out[22]: 
     a
0    0
1  100
2    0

edited Nov 7, 2022 at 10:11

Stefan

12.8k10 gold badges81 silver badges147 bronze badges

answered May 17, 2016 at 17:41

follyroof

3,5502 gold badges30 silver badges26 bronze badges

6 Comments

DomingoR Over a year ago

This is the inline solution I was looking for! Thanks!

gies0r Over a year ago

If you only want to apply clip on a specific column you can go like df['col_name'] = df['col_name'].clip(lower=0)

Sally Levesque Over a year ago

clip_lower has been deprecated so rather stick to df.clip(lower=0)

Alaa M. Over a year ago

This seems to be the fastest method

Diedre Over a year ago

@DomingoR said this is inline, but this doesn't edit the values inplace, you need to assign to the same column to have the clip work.

|

Michael Conlin · Accepted Answer · 2019-10-19 22:27:16Z

23

Another clean option that I have found useful is pandas.DataFrame.mask which will "replace values where the condition is true."

Create the DataFrame:

In [2]: import pandas as pd

In [3]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})

In [4]: df
Out[4]: 
   a  b
0  0 -3
1 -1  2
2  2  1

Replace negative numbers with 0:

In [5]: df.mask(df < 0, 0)
Out[5]: 
   a  b
0  0  0
1  0  2
2  2  1

Or, replace negative numbers with NaN, which I frequently need:

In [7]: df.mask(df < 0)
Out[7]: 
     a    b
0  0.0  NaN
1  NaN  2.0
2  2.0  1.0

answered Oct 19, 2019 at 22:27

Michael Conlin

9091 gold badge8 silver badges14 bronze badges

1 Comment

mirekphd Over a year ago

.mask() is as KISS as it gets!

Wickkiey · Accepted Answer · 2020-12-26 02:16:50Z

15

With lambda function

df['column'] = df['column'].apply(lambda x : x if x > 0 else 0)

answered Dec 26, 2020 at 2:16

Wickkiey

4,6822 gold badges42 silver badges47 bronze badges

Comments

alacy · Accepted Answer · 2015-01-04 00:58:13Z

14

Perhaps you could use pandas.where(args) like so:

data_frame = data_frame.where(data_frame < 0, 0)

answered Jan 4, 2015 at 0:58

alacy

5,0748 gold badges33 silver badges47 bronze badges

Comments

Stephen Rauch · Accepted Answer · 2018-11-05 01:46:18Z

2

If you are dealing with a large df (40m x 700 in my case) it works much faster and memory savvy through iteration on columns with something like.

for col in df.columns:
    df[col][df[col] < 0] = 0

edited Nov 5, 2018 at 1:46

Stephen Rauch♦

50.1k32 gold badges118 silver badges143 bronze badges

answered Nov 5, 2018 at 1:26

MarKo9

974 bronze badges

2 Comments

ajrlewis Over a year ago

You get a A value is trying to be set on a copy of a slice from a DataFrame warning when you do this

user10381466 Over a year ago

Perhaps using .copy() will avoid it

Terminator17 · Accepted Answer · 2021-11-07 05:28:55Z

1

A slight modification of the answers present.

Let's identify all the numeric columns and create a dataframe with all numeric values. Then replace the negative values with NaN in new dataframe

df_numeric = df.select_dtypes(include=[np.number])
df_numeric = df_numeric.where(lambda x: x > 0, np.nan)

Now, drop the columns where negative values are handled in the main data frame and then concatenate the new column values to the main data frame

numeric_cols = df_numeric.columns.values
df = df.drop(columns=numeric_cols)
df = pd.concat([df, df_numeric], axis = 1)

answered Nov 7, 2021 at 5:28

Terminator17

8601 gold badge7 silver badges13 bronze badges

Comments

tef2128 · Accepted Answer · 2022-09-01 23:14:49Z

0

If you have a dataset of mixed data types, also consider moving the non-numerics to the index, updating the data, then removing the index:

df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],
                       'c': ['foo', 'goo', 'bar']})
df = df.set_index('c')
df[df < 0] = 0
df = df.reset_index()

The approach using _get_numeric_data() didn't work for me for some reason.

answered Sep 1, 2022 at 23:14

tef2128

7901 gold badge9 silver badges22 bronze badges

Comments

Alfredo EP · Accepted Answer · 2023-05-31 18:39:19Z

0

Try this:

df.loc[(df < 0).index, :] = 0

To avoid getting a unpredicatable behavior: Returning a view versus a copy

answered May 31, 2023 at 18:39

Alfredo EP

812 silver badges3 bronze badges

Collectives™ on Stack Overflow

How to replace negative numbers in Pandas Data Frame by zero

9 Answers 9

Comments

6 Comments

1 Comment

Comments

Comments

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

Comments

6 Comments

1 Comment

Comments

Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related