2

I have a dataset with some outlier in the age field here is the unique values of my data sorted

unique = df_csv['AGE'].unique()
print (sorted(unique))

[21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 79, 126, 140, 149, 152, 228, 235, 267]

How can I replace any value greater than 80 with the mean or median of my Age column?

0

3 Answers 3

4

Since you want to work with a column in a dataframe, you should resolve to loc:

 # replace `median` with `mean` if you want
 df_csv.loc[df_csv['AGE']>80,'AGE'] = df_csv['AGE'].median()
Sign up to request clarification or add additional context in comments.

Comments

1

You could do:

series[series > 80] = series.median()
print(series)

Output

0     21
1     22
2     23
3     24
4     25
      ..
58    52
59    52
60    52
61    52
62    52
Length: 63, dtype: int64

Comments

0
median = df_csv['AGE'].median()
# using apply 
df_csv['AGE'].apply(lambda x: median if x>80 else x)

Other method: Here

1 Comment

To explain what apply does : lambda is a function without a name, that you could assign to it any function (similar to def ... but easier to use). lambda x, means select the value from the dataframe. then after the semi colon you have the condition: median if x>80, else keep x the same it goes over every row and does this check

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.