How do I select columns from pandas dataframe that have an average value greater than some limit?

Question

I have a data frame with multiple columns. Each column is a time series of some variable. I only want to pick columns that are significant (by some metric), i.e. I want to pick a subset of columns, s.t. for each column,

the average(over all rows) is greater than x, or
the max (over all rows) is greater than x

i | col1 | col2 | col3 | ....

0 | 0.1 | 0.5. | 0.3. | ....

1 | .09 | 0.4 | 0.4 | ....

2 | .08 | .45 | .36 | ...

Let's say, from the table above, I want to pick only [col2, col3] (with a condition: column_avg > 0.2 ).

Or, only col2, with a condition: column_avg>.4.

And similarly, instead of being conditional on the avg, make it conditional on min or max for each column

x is the same for all columns?

Juan C
– Juan C

2019-08-19 20:46:12 +00:00
Commented Aug 19, 2019 at 20:46 — Juan C
– Juan C, Commented Aug 19, 2019 at 20:46
Yes. same condition over all columns.

Dumbo
– Dumbo

2019-08-19 20:47:13 +00:00
Commented Aug 19, 2019 at 20:47 — Dumbo
– Dumbo, Commented Aug 19, 2019 at 20:47

qscgy · Accepted Answer · 2019-08-19 21:04:04Z

4

Try this:

df2 = df[df.columns[df.mean(axis=0) > 0.2]]
df3 = df[df.columns[df.max(axis=0) > 0.4]]

df.min works the same way.

answered Aug 19, 2019 at 21:04

qscgy

3111 silver badge4 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dvir Itzko Over a year ago

Why no just use df2 = df.loc[:,df.mean(axis=0) > 0.2] ? I think its more straightforward than taking df.columns

Juan C · Accepted Answer · 2019-08-19 20:49:11Z

2

If you want to get every column with a mean over .4:

means = df.mean()
x = .4
useful_cols = [ind for m,ind in zip(means,means.index) if m>x]
df2 = df[useful_cols]

With max you replace df.mean() for df.max()

Please tell me if there's something that needs explanation here.

answered Aug 19, 2019 at 20:49

Juan C

6,1484 gold badges27 silver badges64 bronze badges

Collectives™ on Stack Overflow

How do I select columns from pandas dataframe that have an average value greater than some limit?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related