Find Pandas dataframe column based on values, in Python

Question

I have 2 Pandas Dataframes in Python. Here they are:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10,3),columns=list('ABC'))
df2 = pd.DataFrame(np.random.rand(10,3),columns=list('ABC'))
df['A'] = 1
print df
print df2

   A         B         C
0  1  0.333141  0.803991
1  1  0.043958  0.582038
2  1  0.833433  0.782856
3  1  0.722592  0.237912
4  1  0.634979  0.664208
5  1  0.809748  0.889524
6  1  0.110342  0.650617
7  1  0.035417  0.251089
8  1  0.481492  0.128792
9  1  0.190135  0.213608

          A         B         C
0  0.897373  0.599721  0.361668
1  0.495024  0.471351  0.090395
2  0.651174  0.621328  0.721208
3  0.253459  0.567619  0.104370
4  0.357627  0.616717  0.775327
5  0.164323  0.716166  0.740565
6  0.841509  0.464837  0.398952
7  0.398680  0.186555  0.293076
8  0.298785  0.784237  0.704184
9  0.124763  0.384852  0.307361

As you can see in df, there is one column with only 1's.

I need to do the following:

Find the name of the column in a Dataframe (df) that contains only 1's in all rows.
Drop that column from df
Drop that SAME column from df2

I would like to get this:

          B         C
0  0.333141  0.803991
1  0.043958  0.582038
2  0.833433  0.782856
3  0.722592  0.237912
4  0.634979  0.664208
5  0.809748  0.889524
6  0.110342  0.650617
7  0.035417  0.251089
8  0.481492  0.128792
9  0.190135  0.213608

          B         C
0  0.599721  0.361668
1  0.471351  0.090395
2  0.621328  0.721208
3  0.567619  0.104370
4  0.616717  0.775327
5  0.716166  0.740565
6  0.464837  0.398952
7  0.186555  0.293076
8  0.784237  0.704184
9  0.384852  0.307361

Is there a way to do this?

maxymoo · Accepted Answer · 2015-06-02 03:18:35Z

2

You can use DataFrame.apply with axis=0 to apply a function to every column of a dataframe. In your case you want to check whether all(col==1) for each column. Then you can pick out the columns using a list comprehension, and finally use DataFrame.drop do drop the columns:

allonecols = df.apply(lambda col: all(col==1), axis = 0)
allonecols 
A     True
B    False
C    False
dtype: bool

dropcols = [k for k,v in allonecols.to_dict().items() if v]  
dropcols 
['A']

df2.drop(dropcols, axis = 1)

answered Jun 2, 2015 at 3:18

maxymoo

36.7k12 gold badges97 silver badges121 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Marius Over a year ago

Rather than using the list comprehension, you can use the allonecols series in .loc: df2.loc[:, ~ allonecols]

maxymoo Over a year ago

Nice one marius, never thought of using loc to index columns.

EdChum · Accepted Answer · 2015-06-02 07:56:29Z

1

I would suggest using all on the boolean condition on the entire df rather than use apply:

In [122]:
col_to_drop = df.columns[(df==1).all()]
col_to_drop

Out[122]:
Index(['A'], dtype='object')

In [123]:    
df2.drop(col_to_drop, axis=1)
Out[123]:
          B         C
0  0.507605  0.134758
1  0.777054  0.285220
2  0.121124  0.430874
3  0.422746  0.775676
4  0.563303  0.659942
5  0.582580  0.437603
6  0.221917  0.339737
7  0.634779  0.172416
8  0.703110  0.730759
9  0.426673  0.923138

call all on the boolean comparison returns a series with boolean values for each column:

In [124]:
(df==1).all()

Out[124]:
A     True
B    False
C    False
dtype: bool

You can then use this to mask the columns to return which column you wish to drop from df2 as shown above.

answered Jun 2, 2015 at 7:56

EdChum

397k204 gold badges836 silver badges583 bronze badges

3 Comments

edesz Over a year ago

I marked this as the Answer. However, maxymoo's response is also very good.

EdChum Over a year ago

Although maxymoo's answer is correct generally one should avoid using apply if there is a method that is vectorised and can operate on the entire df which this does

edesz Over a year ago

Thank you. I just initially found it easier to understand his Answer because I am more familiar with list comprehension. I now understand what you mean.

Collectives™ on Stack Overflow

Find Pandas dataframe column based on values, in Python

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related