0

Have a csv file which contains several columns, some columns are mixed with letters and numbers. Need remove letters and set to null and change the column to integer but got some error. It seems Pandas recently added nullable integer type. https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html. But I still get errors while changing to int. I need keep the column as int so I could not use another way workaround to set the column to float with NAN in the column. Data looks like this:

 id    count      volume   
 001,     A   ,       1
 002,     1   ,       2

Column count and volume contains values like : ' 1 ', ' 2 ',' A ',.....

I used re module to remove the letters and whitespace

df["count"] = df["count"].apply(lambda x: re.sub(r'\s[a-zA-Z]*', '',x))

Now the values in the column looks like : '1', '2','',.......

Tried to change to 'Int64' but got error:

  df["count"].astype(str).astype('Int64')

TypeError: object cannot be converted to an IntegerDtype

Any suggestion or workaround?

2
  • 3
    df['count'] = pd.to_numeric(df['count'], errors='coerce') Commented Jan 17, 2020 at 0:55
  • df['count'] = pd.to_numeric(df['count'], errors='coerce').astype('Int64') finally worked. Commented Jan 17, 2020 at 5:03

1 Answer 1

7
 df['count'] = pd.to_numeric(df['count'], errors='coerce').astype('Int64')
Sign up to request clarification or add additional context in comments.

1 Comment

Please put your answer always in context instead of just pasting code. See here for more details.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.