Replace values in DataFrame

Question

I have a large DataFrame object where missing values are pre-coded as 0.001. These missing values only occur at the beginning of the DataFrame. For example:

df = pd.DataFrame({'a':[0.001, 0.001, 0.001, 0.50, 0.10, 0.001, 0.75]})

The problem is.... sometimes there are actual 0.001 values not at the beginning of the DataFrame that I dont want to drop (like in the example above).

What I want is:

df = pd.DataFrame({'a' :[NaN, NaN, NaN, 0.50, 0.10, 0.001, 0.75]})

Put I can't figure out a simple way to only drop the 0.001 values at the beginning of the DataFrame, and ignore the others that occur later on.

The dataset I'm working with is massive, so I was hoping to avoide looping through each variable and each index (which is what I'm currently doing but takes a bit too long).

Any ideas?

The solution to this question that I asked might help: stackoverflow.com/questions/22290793/… — Paul H
– Paul H, Commented Dec 12, 2016 at 19:46

user2285236 · Accepted Answer · 2016-12-12 19:49:51Z

3

Here's an approach:

df.mask(df[df!=0.001].ffill().isnull(), np.nan)
Out: 
       a
0    NaN
1    NaN
2    NaN
3  0.500
4  0.100
5  0.001
6  0.750

This first creates a boolean mask where the df does not equal 0.001. The cells that have 0.001 will be NaN in this selection. If you forward fill this Series/DataFrame, the first elements will not be filled. Then you can use this as a mask to the original DataFrame.

answered Dec 12, 2016 at 19:49

user2285236

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Replace values in DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest