1

Assume that I have the following dataframe:

+---+---------+------+------+------+
|   | summary | col1 | col2 | col3 |
+---+---------+------+------+------+
| 0 | count   | 10   | 10   | 10   |
+---+---------+------+------+------+
| 1 | mean    | 4    | 5    | 5    |
+---+---------+------+------+------+
| 2 | stddev  | 3    | 3    | 3    |
+---+---------+------+------+------+
| 3 | min     | 0    | -1   | 5    |
+---+---------+------+------+------+
| 4 | max     | 100  | 56   | 47   |
+---+---------+------+------+------+

How can I keep only the columns where count > 5, mean>4 and min>0 including the column summary as well?

The desired output is:

+---+---------+------+
|   | summary | col3 |
+---+---------+------+
| 0 | count   | 10   |
+---+---------+------+
| 1 | mean    | 5    |
+---+---------+------+
| 2 | stddev  | 3    |
+---+---------+------+
| 3 | min     | 5    |
+---+---------+------+
| 4 | max     | 47   | 
+---+---------+------+
1
  • easiest to transform the dataframe Commented Aug 12, 2019 at 16:02

5 Answers 5

3

You need:

df2 = df.set_index('summary').T
m1 = df2['count'] > 5
m2 = df2['mean'] > 4
m3 = df2['min'] > 0
df2.loc[m1 & m2 & m3].T.reset_index()

Output:

    summary col3
0   count   10
1   mean    5
2   stddev  3
3   min     5
4   max     47

Note: You can easily use the conditions directly in .loc[] , but when we have multiple conditions, it is best to use separate mask variables (m1, m2, m3)

Sign up to request clarification or add additional context in comments.

Comments

2

loc with callable.

(df.set_index('summary').T
   .loc[lambda x: (x['count'] > 5) & (x['mean'] > 4) & (x['min'] > 0)]
   .T.reset_index())

Comments

1

Here is one way

s=df.set_index('summary')
com=pd.Series([5,4,0],index=['count','mean','min'])
idx=s.loc[com.index].gt(com,axis=0).all().loc[lambda x : x].index
s[idx]
Out[142]: 
         col3
summary      
count      10
mean        5
stddev      3
min         5
max        47

Comments

1

General thrashing about plus query

(
    df.set_index('summary')
      .rename(str.title).T
      .query('Count > 5 & Mean > 4 and Min > 0')
      .T.rename(str.lower)
      .reset_index()
)

  summary  col3
0   count    10
1    mean     5
2  stddev     3
3     min     5
4     max    47

Shenanigans

(
    df[['summary']].join(
        df.iloc[:, 1:].loc[:, df.iloc[[0, 1, 3], 1:].T.gt([5, 4, 0]).all(1)]
    )
)
  summary  col3
0   count    10
1    mean     5
2  stddev     3
3     min     5
4     max    47

Comments

0

Set the summary columns as the index and then do this:

df.T.query("(count > 5) & (mean > 4) & (min > 0)").T

4 Comments

This does not work. count is not a column. :( Did you actually run the code?
@harvpan It is clearly stated that 'Set the summary column as the index', then transpose leads to count being a column
This is not a full answer as it doesn't show the relevant steps. Further, count, mean and min are key words in the query syntax so it simply doesn't work with as described.
@MichaelD. I am expecting the OP to be able to set the summary column as an index. But the solution does work. Here is more complete answer that I've verified to work. Column names take precedence over methods. df = pd.DataFrame([['count', 10, 10, 10], ['mean', 4, 5, 5], ['stdev', 3, 3, 3], ['min', 0, -1, 5], ['max', 100, 56, 47]]) df.columns = ['summary', 'col1', 'col2', 'col3'] df.set_index('summary').T.query("(count > 5) & (mean>4) & (min>0)").T

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.