2

I have big datasets of more than 1 million rows and varying column size(sometimes 1 column or sometimes different number of columns). initially, I created a script, it was working fine. but recently I ran into an issue which can be replicated with the below script.

import pandas as pd
df=pd.DataFrame({'a':[0,0],'b':[100,1]})
df[df>0]='S1'
df[df==0]='S0'

Error:

TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

line 3 and 4 can be interchangeable and the issue will be at the 4th line.

initial df:

a b
0 100
0 1 

Expecting df:

a  b
S0 S1
S0 S1 
1
  • What is the issue? Can you add results or a stack trace with the undesirable behavior? Commented Aug 21, 2018 at 18:13

1 Answer 1

2

For DataFrame-wide replacements, that isn't quite right. Use where or mask:

df = df.where(df == 0, 'S1').where(df > 0, 'S0')
df
    a   b
0  S0  S1
1  S0  S1

Alternatively, you can use np.select:

df[:] = np.select([df > 0, df == 0], ['S1', 'S0'], default=df)
df
    a   b
0  S0  S1
1  S0  S1
Sign up to request clarification or add additional context in comments.

3 Comments

Hi, Your solution works but why my approach is wrong? any idea?
@BhanuTez df[df>0] produces a DataFrame masked by NaNs having the same shape and size as the original DataFrame, so there is no way for pandas to know exactly what you are trying to assign.
@BhanuTez No, your method does not work with DataFrames (Series, yes!), and will not work for the foreseeable future.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.