1

I have a dataframe

import pandas as pd
data = {'A': ['SA01', '0007', 'SA06', '0198', 'SA06'], 
        'B': [2012, 2012, 2013, 2014, 2014], }
df = pd.DataFrame(data)

df = A     B
     SA01  2012
     0007  2012
     SA06  2013
     0198  2014
     SA06  2014

I want to use df.apply or other functions of pandas to add a df['C'] as follows:

df = A     B     C
     SA01  2012  M
     0007  2012  F
     SA06  2013  M
     0198  2014  F
     SA06  2014  M

If df['A'] contains substring 'SA' then df['C'] is 'M' else 'F'. How to solve?

0

1 Answer 1

2

Use numpy.where with boolean mask created by contains or startswith:

df['new'] = np.where(df['A'].str.contains('SA'), 'M', 'F')
#alternative solution
#df['new'] = np.where(df['A'].str.startswith('SA'), 'M', 'F')
print (df)
      A     B new
0  SA01  2012   M
1  0007  2012   F
2  SA06  2013   M
3  0198  2014   F
4  SA06  2014   M
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.