I am cleaning data that has years in differing formats. There are seven possible values for the year field of my DataFrame: ['2013-14','2014-15','2015-16','2016-17','22017','22018','22019']. I have resolved the problem by manually dealing with each case, as below:
matchups_df.loc[matchups_df['SEASON_ID'] == '22017', 'SEASON_ID'] = '2017-18'
matchups_df.loc[matchups_df['SEASON_ID'] == '22018', 'SEASON_ID'] = '2018-19'
matchups_df.loc[matchups_df['SEASON_ID'] == '22019', 'SEASON_ID'] = '2019-20'
My question is, why does the code below raise the exception ValueError: invalid literal for int() with base 10: '2016-17'? I have removed the relevant portion from the np.where and used it on a filtered version of the DataFrame to only handle the necessary values, but it raises the same exception. Clearly, I have made some type of syntax eror in converting the string to int, but I haven't been to diagnose where the error lies.
matchups_df.insert(loc = 1, column = 'Season', value = (
np.where(
(len(matchups_df.SEASON_ID) == 5),
(
(matchups_df.SEASON_ID[1:]) +
"-" +
(str((matchups_df.SEASON_ID[3:].astype(int))+1))
),
matchups_df.SEASON_ID
)
)
)
SyntaxErrorrather thanValueError; and b) it would be reported before any attempt at computation.