Drop non-numeric columns from a pandas DataFrame [duplicate]

Question

In my application I load text files that are structured as follows:

First non numeric column (ID)
A number of non-numeric columns (strings)
A number of numeric columns (floats)

The number of the non-numeric columns is variable. Currently I load the data into a DataFrame like this:

source = pandas.read_table(inputfile, index_col=0)

I would like to drop all non-numeric columns in one fell swoop, without knowing their names or indices, since this could be doable reading their dtype. Is this possible with pandas or do I have to cook up something on my own?

Related: stackoverflow.com/q/25039626/5069869

Bernhard
– Bernhard

2016-10-14 09:52:47 +00:00
Commented Oct 14, 2016 at 9:52 — Bernhard
– Bernhard, Commented Oct 14, 2016 at 9:52

sapo_cosmico · Accepted Answer · 2017-04-27 11:21:27Z

73

To avoid using a private method you can also use select_dtypes, where you can either include or exclude the dtypes you want.

Ran into it on this post on the exact same thing.

Or in your case, specifically:
source.select_dtypes(['number']) or source.select_dtypes([np.number]

edited Apr 27, 2017 at 11:21

answered Sep 4, 2015 at 13:55

sapo_cosmico

6,58212 gold badges49 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

hardsetting Over a year ago

I think this is better than using the private method. Maybe you should add the direct answer to the question, which is: source.select_dtypes(['number']) or source.select_dtypes([numpy.number])

Juan Antonio Gomez Moriano Over a year ago

This should be the accepted answer, although the other one will work too, this is more correct, not to mention that the private method, not being part of the api, might change at any time

Worthy7 Over a year ago

Doesn't this return booleans? Also what is the different between 'number' and np.number (just a numpy array of numbers?)

arainchi Feb 16 at 6:56

I'd use something like this to select non-object/numeric colums only: source.select_dtypes(exclude=['object'])

Wouter Overmeire · Accepted Answer · 2012-10-04 11:41:00Z

53

It`s a private method, but it will do the trick: source._get_numeric_data()

In [2]: import pandas as pd

In [3]: source = pd.DataFrame({'A': ['foo', 'bar'], 'B': [1, 2], 'C': [(1,2), (3,4)]})

In [4]: source
Out[4]:
     A  B       C
0  foo  1  (1, 2)
1  bar  2  (3, 4)

In [5]: source._get_numeric_data()
Out[5]:
   B
0  1
1  2

answered Oct 4, 2012 at 11:41

Wouter Overmeire

69.7k10 gold badges67 silver badges44 bronze badges

2 Comments

Richard Herron Over a year ago

Thanks! Are there any precautions in using "private methods" in pandas? Or, alternatively, why is this private? (I can open a new question, if you suggest.)

Wouter Overmeire Over a year ago

In general adding/removing/change-api of a private method is not considered a (class) api/behavior change. In other words a new version of pandas which is considered to be backwards compatible could e.g remove a private method. I believe _get_numeric_data() is mainly used to support plotting functions/methods. If you feel this is a useful method, you can do a feature request on github asking to make it part of the public api.

Thomas Gotwig · Accepted Answer · 2019-03-03 11:43:50Z

0

This would remove each column which doesn't include float64 numerics.

df = pd.read_csv('sample.csv', index_col=0)
non_floats = []
for col in df:
    if df[col].dtypes != "float64":
        non_floats.append(col)
df = df.drop(columns=non_floats)

answered Mar 3, 2019 at 11:43

Thomas Gotwig

4,5493 gold badges20 silver badges15 bronze badges

1 Comment

Uzay Macar Over a year ago

You can also use pd.api.types.is_numeric_dtype(df[col]).

Community · Accepted Answer · 2019-05-14 15:19:49Z

-1

I also have another possible solution for dropping the columns with categorical value with 2 lines of code, defining a list with columns of categorical values (1st line) and dropping them with the second line. df is our DataFrame

df before dropping:

  to_be_dropped=pd.DataFrame(df.categorical).columns
  df= df.drop(to_be_dropped,axis=1)

df after dropping:

edited May 14, 2019 at 15:19

CommunityBot

11 silver badge

answered Aug 3, 2018 at 12:12

Luigi Bungaro

711 silver badge4 bronze badges

1 Comment

information_interchange Over a year ago

Doesn't work: AttributeError: 'DataFrame' object has no attribute 'categorical'

Collectives™ on Stack Overflow

Drop non-numeric columns from a pandas DataFrame [duplicate]

4 Answers 4

4 Comments

2 Comments

1 Comment

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

2 Comments

1 Comment

1 Comment

Linked

Related