how to replace column value with range in pandas dataframe

Question

I have data frame called 'df' and I want to replace values within a range of columns in a data frame with a corresponding value in another column.

6 <= age < 11 then 1

11 <= age < 16 then 2

16 <= age < 21 then 3

21 <= age then 4

        age
86508   12.0
86509   6.0
86510   7.0
86511   8.0
86512   10.0
86513   15.0
86514   15.0
86515   16.0
86516   20.0
86517   23.0
86518   23.0
86519   7.0
86520   18.0

Results are

            age    stage
    86508   12.0    2
    86509   6.0     1    
    86510   7.0     1
    86511   8.0     1
    86512   10.0    1
    86513   15.0    2
    86514   15.0    2
    86515   16.0    2
    86516   20.0    3
    86517   23.0    4
    86518   23.0    4
    86519   7.0     1
    86520   18.0    3

Thanks.

MaxU - stand with Ukraine · Accepted Answer · 2017-05-30 18:55:39Z

6

Use pd.cut():

In [37]: df['stage'] = pd.cut(df.age, bins=[0,11,16,21,300], labels=[1,2,3,4])

In [38]: df
Out[38]:
        age stage
86508  12.0     2
86509   6.0     1
86510   7.0     1
86511   8.0     1
86512  10.0     1
86513  15.0     2
86514  15.0     2
86515  16.0     2
86516  20.0     3
86517  23.0     4
86518  23.0     4
86519   7.0     1
86520  18.0     3

or more generic solution provided by @ayhan:

In [39]: df['stage'] = pd.cut(df.age, bins=[0, 11, 16, 21, np.inf], labels=False, right=True) + 1

In [40]: df
Out[40]:
        age  stage
86508  12.0      2
86509   6.0      1
86510   7.0      1
86511   8.0      1
86512  10.0      1
86513  15.0      2
86514  15.0      2
86515  16.0      2
86516  20.0      3
86517  23.0      4
86518  23.0      4
86519   7.0      1
86520  18.0      3

edited May 30, 2017 at 18:55

answered May 30, 2017 at 18:48

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

A.Kot Over a year ago

Nice! Way better than using conditional statements.

Vaishali Over a year ago

Great answer! I always forget pd.cut for this scenario. Next time:)

MaxU - stand with Ukraine Over a year ago

@Thank you guys ! :-)

user2285236 Over a year ago

pd.cut(df.age, bins=[0, 11, 16, 21, np.inf], labels=False, right=True) + 1 might be more generic (both for bins and labels).

piRSquared · Accepted Answer · 2017-05-30 18:55:37Z

3

Using np.searchsorted

a = np.array([-np.inf, 6, 11, 16, 21, np.inf])
df.assign(stage=a.searchsorted(df.age, side='right') - 1)

        age  stage
86508  12.0      2
86509   6.0      1
86510   7.0      1
86511   8.0      1
86512  10.0      1
86513  15.0      2
86514  15.0      2
86515  16.0      3
86516  20.0      3
86517  23.0      4
86518  23.0      4
86519   7.0      1
86520  18.0      3

Timing
small data

%%timeit
a = np.array([-np.inf, 6, 11, 16, 21, np.inf])
df.assign(stage=a.searchsorted(df.age, side='right') - 1)
1000 loops, best of 3: 288 µs per loop

%%timeit
df.assign(stage=pd.cut(df.age, bins=[0,11,16,21,300], labels=[1,2,3,4]))
1000 loops, best of 3: 668 µs per loop

edited May 30, 2017 at 18:55

answered May 30, 2017 at 18:52

piRSquared

296k68 gold badges509 silver badges654 bronze badges

1 Comment

Vaishali Over a year ago

Updating my bag of useful function:) +1

Collectives™ on Stack Overflow

how to replace column value with range in pandas dataframe

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related