I have a csv file similar to this but with about 155,000 rows with years from 1910-2010 and 83 different station id's:
station_id year month element 1 2 3 4 5 6
216565 2008 7 SNOW 0TT 0 0 0 0 0
216565 2008 8 SNOW 0 0T 0 0 0 0
216565 2008 9 SNOW 0 0 0 0 0 0
and I want to replace any value that has a pattern of a number and then one letter or a number and then two letter with NaN.
My desired output then is:
station_id year month element 1 2 3 4 5 6
216565 2008 7 SNOW NaN 0 0 0 0 0
216565 2008 8 SNOW 0 NaN 0 0 0 0
216565 2008 9 SNOW 0 0 0 0 0 0
I have tried to use:
replace=df.replace([r'[0-9] [A-Z]'], ['NA'])
replace2=replace.replace([r'[0-9][A-Z][A-Z]'], ['NA'])
I was hoping by using the pattern of [0-9] [A-Z] would take care of a number and just one letter and then [0-9][A-Z][A-Z] would replace any cells with 2 letters but the file stays the exact same even though no errors are returned.
Any help would be much appreciated.