How to replace values greater than specific value in dataframe column?

Question

I have a dataset with some outlier in the age field here is the unique values of my data sorted

unique = df_csv['AGE'].unique()
print (sorted(unique))

[21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 79, 126, 140, 149, 152, 228, 235, 267]

How can I replace any value greater than 80 with the mean or median of my Age column?

Quang Hoang · Accepted Answer · 2020-11-21 00:05:28Z

4

Since you want to work with a column in a dataframe, you should resolve to loc:

 # replace `median` with `mean` if you want
 df_csv.loc[df_csv['AGE']>80,'AGE'] = df_csv['AGE'].median()

answered Nov 21, 2020 at 0:05

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Dani Mesejo · Accepted Answer · 2020-11-20 23:59:25Z

1

You could do:

series[series > 80] = series.median()
print(series)

Output

0     21
1     22
2     23
3     24
4     25
      ..
58    52
59    52
60    52
61    52
62    52
Length: 63, dtype: int64

answered Nov 20, 2020 at 23:59

Dani Mesejo

62.2k6 gold badges56 silver badges86 bronze badges

Comments

ombk · Accepted Answer · 2020-11-21 01:44:25Z

0

median = df_csv['AGE'].median()
# using apply 
df_csv['AGE'].apply(lambda x: median if x>80 else x)

Other method: Here

edited Nov 21, 2020 at 1:44

answered Nov 20, 2020 at 23:59

ombk

2,1091 gold badge6 silver badges16 bronze badges

1 Comment

ombk Over a year ago

To explain what apply does : lambda is a function without a name, that you could assign to it any function (similar to def ... but easier to use). lambda x, means select the value from the dataframe. then after the semi colon you have the condition: median if x>80, else keep x the same it goes over every row and does this check

Collectives™ on Stack Overflow

How to replace values greater than specific value in dataframe column?

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related