59

In my application I load text files that are structured as follows:

  • First non numeric column (ID)
  • A number of non-numeric columns (strings)
  • A number of numeric columns (floats)

The number of the non-numeric columns is variable. Currently I load the data into a DataFrame like this:

source = pandas.read_table(inputfile, index_col=0)

I would like to drop all non-numeric columns in one fell swoop, without knowing their names or indices, since this could be doable reading their dtype. Is this possible with pandas or do I have to cook up something on my own?

1

4 Answers 4

73

To avoid using a private method you can also use select_dtypes, where you can either include or exclude the dtypes you want.

Ran into it on this post on the exact same thing.

Or in your case, specifically:
source.select_dtypes(['number']) or source.select_dtypes([np.number]

Sign up to request clarification or add additional context in comments.

4 Comments

I think this is better than using the private method. Maybe you should add the direct answer to the question, which is: source.select_dtypes(['number']) or source.select_dtypes([numpy.number])
This should be the accepted answer, although the other one will work too, this is more correct, not to mention that the private method, not being part of the api, might change at any time
Doesn't this return booleans? Also what is the different between 'number' and np.number (just a numpy array of numbers?)
I'd use something like this to select non-object/numeric colums only: source.select_dtypes(exclude=['object'])
53

It`s a private method, but it will do the trick: source._get_numeric_data()

In [2]: import pandas as pd

In [3]: source = pd.DataFrame({'A': ['foo', 'bar'], 'B': [1, 2], 'C': [(1,2), (3,4)]})

In [4]: source
Out[4]:
     A  B       C
0  foo  1  (1, 2)
1  bar  2  (3, 4)

In [5]: source._get_numeric_data()
Out[5]:
   B
0  1
1  2

2 Comments

Thanks! Are there any precautions in using "private methods" in pandas? Or, alternatively, why is this private? (I can open a new question, if you suggest.)
In general adding/removing/change-api of a private method is not considered a (class) api/behavior change. In other words a new version of pandas which is considered to be backwards compatible could e.g remove a private method. I believe _get_numeric_data() is mainly used to support plotting functions/methods. If you feel this is a useful method, you can do a feature request on github asking to make it part of the public api.
0

This would remove each column which doesn't include float64 numerics.

df = pd.read_csv('sample.csv', index_col=0)
non_floats = []
for col in df:
    if df[col].dtypes != "float64":
        non_floats.append(col)
df = df.drop(columns=non_floats)

1 Comment

You can also use pd.api.types.is_numeric_dtype(df[col]).
-1

I also have another possible solution for dropping the columns with categorical value with 2 lines of code, defining a list with columns of categorical values (1st line) and dropping them with the second line. df is our DataFrame

df before dropping: df before dropping

  to_be_dropped=pd.DataFrame(df.categorical).columns
  df= df.drop(to_be_dropped,axis=1)

df after dropping: df after dropping

1 Comment

Doesn't work: AttributeError: 'DataFrame' object has no attribute 'categorical'

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.