4

I have data frame called 'df' and I want to replace values within a range of columns in a data frame with a corresponding value in another column.

  1. 6 <= age < 11 then 1

    11 <= age < 16 then 2

    16 <= age < 21 then 3

    21 <= age then 4

            age
    86508   12.0
    86509   6.0
    86510   7.0
    86511   8.0
    86512   10.0
    86513   15.0
    86514   15.0
    86515   16.0
    86516   20.0
    86517   23.0
    86518   23.0
    86519   7.0
    86520   18.0
    

Results are

            age    stage
    86508   12.0    2
    86509   6.0     1    
    86510   7.0     1
    86511   8.0     1
    86512   10.0    1
    86513   15.0    2
    86514   15.0    2
    86515   16.0    2
    86516   20.0    3
    86517   23.0    4
    86518   23.0    4
    86519   7.0     1
    86520   18.0    3

Thanks.

2 Answers 2

6

Use pd.cut():

In [37]: df['stage'] = pd.cut(df.age, bins=[0,11,16,21,300], labels=[1,2,3,4])

In [38]: df
Out[38]:
        age stage
86508  12.0     2
86509   6.0     1
86510   7.0     1
86511   8.0     1
86512  10.0     1
86513  15.0     2
86514  15.0     2
86515  16.0     2
86516  20.0     3
86517  23.0     4
86518  23.0     4
86519   7.0     1
86520  18.0     3

or more generic solution provided by @ayhan:

In [39]: df['stage'] = pd.cut(df.age, bins=[0, 11, 16, 21, np.inf], labels=False, right=True) + 1

In [40]: df
Out[40]:
        age  stage
86508  12.0      2
86509   6.0      1
86510   7.0      1
86511   8.0      1
86512  10.0      1
86513  15.0      2
86514  15.0      2
86515  16.0      2
86516  20.0      3
86517  23.0      4
86518  23.0      4
86519   7.0      1
86520  18.0      3
Sign up to request clarification or add additional context in comments.

4 Comments

Nice! Way better than using conditional statements.
Great answer! I always forget pd.cut for this scenario. Next time:)
@Thank you guys ! :-)
pd.cut(df.age, bins=[0, 11, 16, 21, np.inf], labels=False, right=True) + 1 might be more generic (both for bins and labels).
3

Using np.searchsorted

a = np.array([-np.inf, 6, 11, 16, 21, np.inf])
df.assign(stage=a.searchsorted(df.age, side='right') - 1)

        age  stage
86508  12.0      2
86509   6.0      1
86510   7.0      1
86511   8.0      1
86512  10.0      1
86513  15.0      2
86514  15.0      2
86515  16.0      3
86516  20.0      3
86517  23.0      4
86518  23.0      4
86519   7.0      1
86520  18.0      3

Timing
small data

%%timeit
a = np.array([-np.inf, 6, 11, 16, 21, np.inf])
df.assign(stage=a.searchsorted(df.age, side='right') - 1)
1000 loops, best of 3: 288 µs per loop

%%timeit
df.assign(stage=pd.cut(df.age, bins=[0,11,16,21,300], labels=[1,2,3,4]))
1000 loops, best of 3: 668 µs per loop

1 Comment

Updating my bag of useful function:) +1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.