3

I have a fairly large pandas dataframe (11k rows and 20 columns). One column has a mixed data type, mostly numeric (float) with a handful of strings scattered throughout.

I subset this dataframe by querying other columns before performing some statistical analysis using the data in the mixed column (but can't do this if there's a string present). 99% of the time once subsetted this column is purely numeric, but rarely a string value will end up in the subset, which I need to trap.

What's the most efficient/pythonic way of looping through a Pandas mixed type column to check for strings (or conversely check whether the whole column is full of numeric values or not)?

If there is even a single string present in the column I want to raise an error, otherwise proceed.

3
  • 1
    Can you simply try to cast it to a float? df['col_test'].astype(float) Commented Mar 7, 2018 at 11:46
  • yes, I'm already doing that and catching type and value errors I was just looking for an alternative way, but I may already have the solution. I was wondering if pandas had a built in method for example (although I can't find it in the docs) Commented Mar 7, 2018 at 11:56
  • plus I should add that casting will tell me there's a problem but not necessarily how many strings or their indexes which would be my next step potentially Commented Mar 7, 2018 at 11:57

2 Answers 2

3

This is one way. I'm not sure it can be vectorised.

import pandas as pd

df = pd.DataFrame({'A': [1, None, 'hello', True, 'world', 'mystr', 34.11]})

df['stringy'] = [isinstance(x, str) for x in df.A]

#        A stringy
# 0      1   False
# 1   None   False
# 2  hello    True
# 3   True   False
# 4  world    True
# 5  mystr    True
# 6  34.11   False
Sign up to request clarification or add additional context in comments.

Comments

0

Here's a different way. It converts the values of column A to numeric, but does not fail on errors: strings are replaced by NA. The notnull() is there to remove these NA.

df = df[pd.to_numeric(df.A, errors='coerce').notnull()]

However, if there were NAs in the column already, they too will be removed.

See also: Select row from a DataFrame based on the type of the object(i.e. str)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.