22

I have this Pandas dataframe (df):

     A    B
0    1    green
1    2    red
2    s    blue
3    3    yellow
4    b    black

A type is object.

I'd select the record where A value are integer or numeric to have:

     A    B
0    1    green
1    2    red
3    3    yellow

Thanks

4 Answers 4

27

Call apply on the dataframe (note the double square brackets df[['A']] rather than df['A']) and call the string method isdigit(), we then set param axis=1 to apply the lambda function row-wise. What happens here is that the index is used to create a boolean mask.

In [66]:
df[df[['A']].apply(lambda x: x[0].isdigit(), axis=1)]
Out[66]:
       A       B
Index           
0      1   green
1      2     red
3      3  yellow

Update

If you're using a version 0.16.0 or newer then the following will also work:

In [6]:
df[df['A'].astype(str).str.isdigit()]

Out[6]:
   A       B
0  1   green
1  2     red
3  3  yellow

Here we cast the Series to str using astype and then call the vectorised str.isdigit

Also note that convert_objects is deprecated and one should use to_numeric for the latest versions 0.17.0 or newer

Sign up to request clarification or add additional context in comments.

2 Comments

It works perfectly. I tried using df.apply(lambda x: isinstance(df[A], (int,float)) , axis=1) but it return always False. Your function works better
The first solution doesn't work for me but the second one works. (pandas version 0.24.1)
9

You can use convert_objects, which when convert_numeric=True will forcefully set all non-numeric to nan. Dropping them and indexing gets your result.

This will be considerably faster that using apply on a larger frame as this is all implemented in cython.

In [30]: df[['A']].convert_objects(convert_numeric=True)
Out[30]: 
    A
0   1
1   2
2 NaN
3   3
4 NaN

In [31]: df[['A']].convert_objects(convert_numeric=True).dropna()
Out[31]: 
   A
0  1
1  2
3  3

In [32]: df[['A']].convert_objects(convert_numeric=True).dropna().index
Out[32]: Int64Index([0, 1, 3], dtype='int64')

In [33]: df.iloc[df[['A']].convert_objects(convert_numeric=True).dropna().index]
Out[33]: 
   A       B
0  1   green
1  2     red
3  3  yellow

Comments

5

Note that convert_objects is deprecated

>>> df[['A']].convert_objects(convert_numeric=True)
__main__:1: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

From 0.17.0: use pd.to_numeric, set errors='coerce' so that incorrect parsing returns NaN. Use notnull to return a boolean mask to use on the original dataframe:

>>> df[pd.to_numeric(df.A, errors='coerce').notnull()]
   A       B
0  1   green
1  2     red
3  3  yellow

Comments

0

Personally, I think it will be more succinct to just use the built-in map compared with .apply()

In [13]: df[map(pred, df['B'])]

1 Comment

more succinct does not mean better. Many of pandas built in functions employ optimizations that external python functions like map may not be able to access.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.