2

I have 2 Pandas Dataframes in Python. Here they are:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10,3),columns=list('ABC'))
df2 = pd.DataFrame(np.random.rand(10,3),columns=list('ABC'))
df['A'] = 1
print df
print df2

   A         B         C
0  1  0.333141  0.803991
1  1  0.043958  0.582038
2  1  0.833433  0.782856
3  1  0.722592  0.237912
4  1  0.634979  0.664208
5  1  0.809748  0.889524
6  1  0.110342  0.650617
7  1  0.035417  0.251089
8  1  0.481492  0.128792
9  1  0.190135  0.213608

          A         B         C
0  0.897373  0.599721  0.361668
1  0.495024  0.471351  0.090395
2  0.651174  0.621328  0.721208
3  0.253459  0.567619  0.104370
4  0.357627  0.616717  0.775327
5  0.164323  0.716166  0.740565
6  0.841509  0.464837  0.398952
7  0.398680  0.186555  0.293076
8  0.298785  0.784237  0.704184
9  0.124763  0.384852  0.307361

As you can see in df, there is one column with only 1's.

I need to do the following:

  1. Find the name of the column in a Dataframe (df) that contains only 1's in all rows.
  2. Drop that column from df
  3. Drop that SAME column from df2

I would like to get this:

          B         C
0  0.333141  0.803991
1  0.043958  0.582038
2  0.833433  0.782856
3  0.722592  0.237912
4  0.634979  0.664208
5  0.809748  0.889524
6  0.110342  0.650617
7  0.035417  0.251089
8  0.481492  0.128792
9  0.190135  0.213608

          B         C
0  0.599721  0.361668
1  0.471351  0.090395
2  0.621328  0.721208
3  0.567619  0.104370
4  0.616717  0.775327
5  0.716166  0.740565
6  0.464837  0.398952
7  0.186555  0.293076
8  0.784237  0.704184
9  0.384852  0.307361

Is there a way to do this?

2 Answers 2

2

You can use DataFrame.apply with axis=0 to apply a function to every column of a dataframe. In your case you want to check whether all(col==1) for each column. Then you can pick out the columns using a list comprehension, and finally use DataFrame.drop do drop the columns:

allonecols = df.apply(lambda col: all(col==1), axis = 0)
allonecols 
A     True
B    False
C    False
dtype: bool

dropcols = [k for k,v in allonecols.to_dict().items() if v]  
dropcols 
['A']

df2.drop(dropcols, axis = 1)
Sign up to request clarification or add additional context in comments.

2 Comments

Rather than using the list comprehension, you can use the allonecols series in .loc: df2.loc[:, ~ allonecols]
Nice one marius, never thought of using loc to index columns.
1

I would suggest using all on the boolean condition on the entire df rather than use apply:

In [122]:
col_to_drop = df.columns[(df==1).all()]
col_to_drop

Out[122]:
Index(['A'], dtype='object')

In [123]:    
df2.drop(col_to_drop, axis=1)
Out[123]:
          B         C
0  0.507605  0.134758
1  0.777054  0.285220
2  0.121124  0.430874
3  0.422746  0.775676
4  0.563303  0.659942
5  0.582580  0.437603
6  0.221917  0.339737
7  0.634779  0.172416
8  0.703110  0.730759
9  0.426673  0.923138

call all on the boolean comparison returns a series with boolean values for each column:

In [124]:
(df==1).all()

Out[124]:
A     True
B    False
C    False
dtype: bool

You can then use this to mask the columns to return which column you wish to drop from df2 as shown above.

3 Comments

I marked this as the Answer. However, maxymoo's response is also very good.
Although maxymoo's answer is correct generally one should avoid using apply if there is a method that is vectorised and can operate on the entire df which this does
Thank you. I just initially found it easier to understand his Answer because I am more familiar with list comprehension. I now understand what you mean.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.