I'm trying to replace strings with integers in a pandas dataframe. I've already visited here but the solution doesn't work.
Reprex:
import pandas as pd
pd.__version__
> '1.4.1'
test = pd.DataFrame(data = {'a': [None, 'Y', 'N', '']}, dtype = 'string')
test.replace(to_replace = 'Y', value = 1)
> ValueError: Cannot set non-string value '1' into a StringArray.
I know that I could do this individually for each column, either explicitly or using apply, but I am trying to avoid that. I'd ideally replace all 'Y' in the dataframe with int(1), all 'N' with int(0) and all '' with None or pd.NA, so the replace function appears to be the fastest/clearest way to do this.
stringtype toobjecttype which will allow you to set mixed datatypes in that columnobject? Or, would I have to explicitly hardcode which columns to convert to `object'? Ideally I'd convert only columns that need converting, without hardcoding.for i in test.select_dtypes('string').columns: test[i] = test[i].astype(object)object, then using thepandas.convert_dtypes()function to back-convert, and it pretty much takes care of everything. Thanks!