18

In short ... I have a Python Pandas data frame that is read in from an Excel file using 'read_table'. I would like to keep a handful of the series from the data, and purge the rest. I know that I can just delete what I don't want one-by-one using 'del data['SeriesName']', but what I'd rather do is specify what to keep instead of specifying what to delete.

If the simplest answer is to copy the existing data frame into a new data frame that only contains the series I want, and then delete the existing frame in its entirety, I would satisfied with that solution ... but if that is indeed the best way, can someone walk me through it?

TIA ... I'm a newb to Pandas. :)

3 Answers 3

37

You can use the DataFrame drop function to remove columns. You have to pass the axis=1 option for it to work on columns and not rows. Note that it returns a copy so you have to assign the result to a new DataFrame:

In [1]: from pandas import *

In [2]: df = DataFrame(dict(x=[0,0,1,0,1], y=[1,0,1,1,0], z=[0,0,1,0,1]))

In [3]: df
Out[3]:
   x  y  z
0  0  1  0
1  0  0  0
2  1  1  1
3  0  1  0
4  1  0  1

In [4]: df = df.drop(['x','y'], axis=1)

In [5]: df
Out[5]:
   z
0  0
1  0
2  1
3  0
4  1
Sign up to request clarification or add additional context in comments.

2 Comments

This does indeed work well, but in this instance I only need to keep about 5-6 out of 40-50 series of data, and the series I want to drop may fluctuate based on changes in the input data file. Good to learn usage of the .drop function though - thanks!
I just had to do something similar to what you've done, and in my case, I've pre-computed the list of things I need to drop, and then passed in the list to the drop() function. Worked like a charm!
15

Basically the same as Zelazny7's answer -- just specifying what to keep:

In [68]: df
Out[68]: 
   x  y  z
0  0  1  0
1  0  0  0
2  1  1  1
3  0  1  0
4  1  0  1

In [70]: df = df[['x','z']]                                                                

In [71]: df
Out[71]: 
   x  z
0  0  0
1  0  0
2  1  1
3  0  0
4  1  1

*Edit*

You can specify a large number of columns through indexing/slicing into the Dataframe.columns object.
This object of type(pandas.Index) can be viewed as a dict of column labels (with some extended functionality).

See this extension of above examples:

In [4]: df.columns
Out[4]: Index([x, y, z], dtype=object)

In [5]: df[df.columns[1:]]
Out[5]: 
   y  z
0  1  0
1  0  0
2  1  1
3  1  0
4  0  1

In [7]: df.drop(df.columns[1:], axis=1)
Out[7]: 
   x
0  0
1  0
2  1
3  0
4  1

3 Comments

@theodros Zelleke, what if i had about 50 columns i want to drop and 50 columns i want to keep. and the number of columns can change each instance i run it. is there a way to do some sort of df.drop(colname1:colname50) so kind of dropping chunks of cols at a time
@Theodros Zelleke, thanks for the extra information what about dropping with label names rather than column numbers. so in your example dropping ['y':'z']
Please select this as the answer to your question. And I know it's been a while but @zelazny7 's answer contains a bad practice when importing a large package - from pandas import *
1

You can also specify a list of columns to keep with the usecols option in pandas.read_table. This speeds up the loading process as well.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.