2

I been trying to get my code to work but I am having some trouble here. It would be great if someone could assist me

DF

  Col1              Col2          
  2017-01-01        Coffee
  2017-01-01        Muffin
  2017-01-01        Donut
  2017-01-01        Toast
  2017-01-01        
  2017-01-01        

How can I change Col2 so that every value that isn't Coffee or Muffin or null becomes 'Other'?

  Col1              Col2          
  2017-01-01        Coffee
  2017-01-01        Muffin
  2017-01-01        Other
  2017-01-01        Other
  2017-01-01        
  2017-01-01 

Edit:

df.loc[~df.Col2.isin(['Coffee','Muffin']), 'Col2'] = 'Other'

^this is where I am right now, but how can I add a null statement in the isin

3
  • "get my code to work" - please include your code. Commented Jan 30, 2018 at 3:33
  • Is that NaN or an empty string? They aren't the same thing. Commented Jan 30, 2018 at 3:42
  • it is blank, missing value according to df.describe() Commented Jan 30, 2018 at 3:43

3 Answers 3

3

You were almost there. If you're working with NaNs, you'll need an additional check with isnull. Create a mask and set values with loc -

m = ~(df.Col2.isin(['Coffee', 'Muffin']) | df.Col2.isnull())
df.loc[m, 'Col2'] = 'Other'

df

         Col1    Col2
0  2017-01-01  Coffee
1  2017-01-01  Muffin
2  2017-01-01   Other
3  2017-01-01   Other
4  2017-01-01     NaN
5  2017-01-01     NaN

Or, if they're blanks (empty string, not NaN - they're different!), perform an equality comparison for the second condition -

m = ~(df.Col2.isin(['Coffee', 'Muffin']) | df.Col2.eq(''))

Here are some more possibilities with np.where/pd.Series.where/pd.Series.mask -

df.Col2 = np.where(m, 'Other', df.Col2)

Or,

df.Col2 = df.Col2.where(~m, 'Other')

Or,

df.Col2 = df.Col2.mask(m, 'Other')

df

         Col1    Col2
0  2017-01-01  Coffee
1  2017-01-01  Muffin
2  2017-01-01   Other
3  2017-01-01   Other
4  2017-01-01     NaN
5  2017-01-01     NaN
Sign up to request clarification or add additional context in comments.

Comments

2
df = pd.DataFrame({'Col1':['2017-01-01','2017-01-01','2017-01-01','2017-01-01','2017-01-01','2017-01-01'],
 'Col2':['Coffee','Muffin','Donut','Toast',pd.np.nan,pd.np.nan]})

conditions = (df['Col2'] != 'Coffee') & (df['Col2'] != 'Muffin') & (df['Col2'].isnull() == False)

df['Col2'][conditions] = 'Other'

Comments

2

isin can include the np.nan

df.loc[df.Col2.isin(['Donut', 'Toast',np.nan]),'Col2']='Other'
df
Out[112]: 
         Col1    Col2
0  2017-01-01  Coffee
1  2017-01-01  Muffin
2  2017-01-01   Other
3  2017-01-01   Other
4  2017-01-01   Other
5  2017-01-01   Other

1 Comment

df.loc[df.Col2.isin(['Coffee', 'Muffin',pd.np.nan]) == False,'Col2']='Other'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.