1

I'm using the titanic dataset and have a created a series Famsize. I'd like to create a second series that outputs 'single' if famsize =1, 'small' if 1 < famsize < 5 and 'large' if famsize >=5.

   Famsize FamsizeDisc
     1         single
     2         small
     5         large

I've tried using np.where but as I have three outputs I haven't been able to find a solution.

Any suggestions?

1
  • 1
    do share what you've attempted so far. Commented Oct 5, 2017 at 11:00

2 Answers 2

2

Its called binning so use pd.cut i.e

df['new'] = pd.cut(df['Famsize'],bins=[0,1,4,np.inf],labels=['single','small','large'])

Output:

   Famsize FamsizeDisc     new
0        1      single  single
1        2       small   small
2        5       large   large
Sign up to request clarification or add additional context in comments.

Comments

1

Either you could create a function which does the mapping:

def get_sizeDisc(x):
    if x == 1:
        return 'single'
    elif x < 5:
        return 'small'
    elif x >= 5:
        return 'large'

df['FamsizeDisc'] = df.Famsize.apply(get_sizeDisc)

Or you could use .loc

df.loc[df.Famsize==1, 'FamsizeDisc'] = 'single'
df.loc[df.Famsize.between(1,5, inclusive = False), 'FamsizeDisc'] = 'small'
df.loc[df.Famsize>=5, 'FamsizeDisc'] = 'large'

1 Comment

My bad, hadn't reloaded the page to see your answer. I'll remove it from my answer and upvote yours, as it's clearly the more concise solution :D

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.