1

Say I import a csv into pandas, and I realize there are some non-numeric values in a column that I expect to be all numeric.

This is how I would find those values (in a dataframe called df in a column called should_be_numbers):

df[pd.to_numeric(df['should_be_numbers'], errors='coerce').isnull()]['should_be_numbers']

My question: Is there a cleaner/more pythonic/less clunky way to do this?

2
  • pd.read_csv has a dtype arg where you can specify the data type of the column. I'm assuming you have a column being stored as a string and you're getting scientific notation values Commented Aug 1, 2022 at 18:21
  • nope in this case its floats where the null values are all something like "-.--", but I don't know the data well enough to assume all the null values are filled in like that, and I don't want to blindly coerce in case its some weird formatting on valid data! Commented Aug 1, 2022 at 18:23

1 Answer 1

2
df = pd.DataFrame({'should_be_numbers': [1, 22, 'A', 'BB', [1, 22], ['A', 'BB'], 'A1BB22', np.nan, 3.13]})
df[[not (isinstance(value, int) or isinstance(value, float)) for value in df.should_be_numbers]]

Input:

  should_be_numbers
0                 1
1                22
2                 A
3                BB
4           [1, 22]
5           [A, BB]
6            A1BB22
7               NaN
8              3.13

Output:

  should_be_numbers
2                 A
3                BB
4           [1, 22]
5           [A, BB]
6            A1BB22
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.