0
new_df['year'].describe()

count    10866.000000
mean      2004.009939
std         14.958790
min       1968.000000
25%       1996.000000
50%       2006.000000
75%       2012.000000
max       2067.000000
Name: year, dtype: float64

It seems like the erroneous year values are +100 years off (ie, 2067 should probably be 1967). Therefore, for values above 2018, how do I [year - 100] while leaving the rest of the values untouched?

1
  • Did an answer below help? Feel free to accept an answer (green tick on left), or ask for clarification. Commented May 8, 2018 at 11:13

1 Answer 1

1

You can use pd.DataFrame.loc:

new_df.loc[new_df['year'] > 2018, 'year'] -= 100
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.