Drop columns in pandas dataframe based on conditions

Question

Assume that I have the following dataframe:

+---+---------+------+------+------+
|   | summary | col1 | col2 | col3 |
+---+---------+------+------+------+
| 0 | count   | 10   | 10   | 10   |
+---+---------+------+------+------+
| 1 | mean    | 4    | 5    | 5    |
+---+---------+------+------+------+
| 2 | stddev  | 3    | 3    | 3    |
+---+---------+------+------+------+
| 3 | min     | 0    | -1   | 5    |
+---+---------+------+------+------+
| 4 | max     | 100  | 56   | 47   |
+---+---------+------+------+------+

How can I keep only the columns where count > 5, mean>4 and min>0 including the column summary as well?

The desired output is:

+---+---------+------+
|   | summary | col3 |
+---+---------+------+
| 0 | count   | 10   |
+---+---------+------+
| 1 | mean    | 5    |
+---+---------+------+
| 2 | stddev  | 3    |
+---+---------+------+
| 3 | min     | 5    |
+---+---------+------+
| 4 | max     | 47   | 
+---+---------+------+

easiest to transform the dataframe

Joost Döbken
– Joost Döbken

2019-08-12 16:02:59 +00:00
Commented Aug 12, 2019 at 16:02 — Joost Döbken
– Joost Döbken, Commented Aug 12, 2019 at 16:02

harpan · Accepted Answer · 2019-08-12 16:02:28Z

3

You need:

df2 = df.set_index('summary').T
m1 = df2['count'] > 5
m2 = df2['mean'] > 4
m3 = df2['min'] > 0
df2.loc[m1 & m2 & m3].T.reset_index()

Output:

    summary col3
0   count   10
1   mean    5
2   stddev  3
3   min     5
4   max     47

Note: You can easily use the conditions directly in .loc[] , but when we have multiple conditions, it is best to use separate mask variables (m1, m2, m3)

answered Aug 12, 2019 at 16:02

harpan

8,6412 gold badges22 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mark Wang · Accepted Answer · 2019-08-12 16:18:13Z

2

loc with callable.

(df.set_index('summary').T
   .loc[lambda x: (x['count'] > 5) & (x['mean'] > 4) & (x['min'] > 0)]
   .T.reset_index())

edited Aug 12, 2019 at 16:18

answered Aug 12, 2019 at 16:12

Mark Wang

2,7579 silver badges18 bronze badges

Comments

BENY · Accepted Answer · 2019-08-12 16:04:10Z

1

Here is one way

s=df.set_index('summary')
com=pd.Series([5,4,0],index=['count','mean','min'])
idx=s.loc[com.index].gt(com,axis=0).all().loc[lambda x : x].index
s[idx]
Out[142]: 
         col3
summary      
count      10
mean        5
stddev      3
min         5
max        47

answered Aug 12, 2019 at 16:04

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

piRSquared · Accepted Answer · 2019-08-12 16:26:56Z

1

General thrashing about plus `query`

(
    df.set_index('summary')
      .rename(str.title).T
      .query('Count > 5 & Mean > 4 and Min > 0')
      .T.rename(str.lower)
      .reset_index()
)

  summary  col3
0   count    10
1    mean     5
2  stddev     3
3     min     5
4     max    47

Shenanigans

(
    df[['summary']].join(
        df.iloc[:, 1:].loc[:, df.iloc[[0, 1, 3], 1:].T.gt([5, 4, 0]).all(1)]
    )
)
  summary  col3
0   count    10
1    mean     5
2  stddev     3
3     min     5
4     max    47

edited Aug 12, 2019 at 16:26

answered Aug 12, 2019 at 16:14

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Comments

Karthik V · Accepted Answer · 2019-08-12 16:04:54Z

0

Set the summary columns as the index and then do this:

df.T.query("(count > 5) & (mean > 4) & (min > 0)").T

answered Aug 12, 2019 at 16:04

Karthik V

1,8971 gold badge16 silver badges24 bronze badges

4 Comments

harpan Over a year ago

This does not work. count is not a column. :( Did you actually run the code?

Mark Wang Over a year ago

@harvpan It is clearly stated that 'Set the summary column as the index', then transpose leads to count being a column

MichaelD Over a year ago

This is not a full answer as it doesn't show the relevant steps. Further, count, mean and min are key words in the query syntax so it simply doesn't work with as described.

Karthik V Over a year ago

@MichaelD. I am expecting the OP to be able to set the summary column as an index. But the solution does work. Here is more complete answer that I've verified to work. Column names take precedence over methods.

df = pd.DataFrame([['count', 10, 10, 10], ['mean', 4, 5, 5], ['stdev', 3, 3, 3], ['min', 0, -1, 5], ['max', 100, 56, 47]])

df.columns = ['summary', 'col1', 'col2', 'col3'] df.set_index('summary').T.query("(count > 5) & (mean>4) & (min>0)").T

Collectives™ on Stack Overflow

Drop columns in pandas dataframe based on conditions

5 Answers 5

Comments

Comments

Comments

General thrashing about plus `query`

Shenanigans

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

General thrashing about plus query

Shenanigans

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related

General thrashing about plus `query`