Have a csv file which contains several columns, some columns are mixed with letters and numbers. Need remove letters and set to null and change the column to integer but got some error. It seems Pandas recently added nullable integer type. https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html. But I still get errors while changing to int. I need keep the column as int so I could not use another way workaround to set the column to float with NAN in the column. Data looks like this:
id count volume
001, A , 1
002, 1 , 2
Column count and volume contains values like : ' 1 ', ' 2 ',' A ',.....
I used re module to remove the letters and whitespace
df["count"] = df["count"].apply(lambda x: re.sub(r'\s[a-zA-Z]*', '',x))
Now the values in the column looks like : '1', '2','',.......
Tried to change to 'Int64' but got error:
df["count"].astype(str).astype('Int64')
TypeError: object cannot be converted to an IntegerDtype
Any suggestion or workaround?
df['count'] = pd.to_numeric(df['count'], errors='coerce')