Conditional selecting rows in pandas DataFrame with MultiIndex

Question

I have a DataFrame like this:

df = pd.DataFrame(np.random.randn(6, 6),
                  columns=pd.MultiIndex.from_arrays((['A','A','A','B','B','B'], 
                                                     ['a', 'b', 'c', 'a', 'b', 'c'])))
df
          A                             B                    
          a         b         c         a         b         c
0 -0.089902 -2.235642  0.282761  0.725579  1.266029 -0.354892
1 -1.753303  1.092057  0.484323  1.789094 -0.316307  0.416002
2 -0.409028 -0.920366 -0.396802 -0.569926 -0.538649 -0.844967
3  1.789569 -0.935632  0.004476 -1.873532 -1.136138 -0.867943
4  0.244112  0.298361 -1.607257 -0.181820  0.577446  0.556841
5  0.903908 -1.379358  0.361620  1.290646 -0.523404 -0.518992

I would like to select only the rows that have a value larger than 0 in column c. I figured that I will have to use pd.IndexSlice to select only the second level index c.

idx = pd.IndexSlice
df.loc[:,idx[:,['c']]] > 0
       A      B
       c      c
0   True  False
1   True   True
2  False  False
3   True  False
4  False   True
5   True  False

So, now I would expect that I could simply do df[df.loc[:,idx[:,['c']]] > 0], however that gives me an unexpected result:

df[df.loc[:,idx[:,['c']]] > 0]
    A                 B              
    a   b         c   a   b         c
0 NaN NaN  0.282761 NaN NaN       NaN
1 NaN NaN  0.484323 NaN NaN  0.416002
2 NaN NaN       NaN NaN NaN       NaN
3 NaN NaN  0.004476 NaN NaN       NaN
4 NaN NaN       NaN NaN NaN  0.556841
5 NaN NaN  0.361620 NaN NaN       NaN

What I would have liked to have is all values (not NaNs) and only the rows where any of the c-columns is greater 0.

          A                             B                    
          a         b         c         a         b         c
0 -0.089902 -2.235642  0.282761  0.725579  1.266029 -0.354892
1 -1.753303  1.092057  0.484323  1.789094 -0.316307  0.416002
3  1.789569 -0.935632  0.004476 -1.873532 -1.136138 -0.867943
4  0.244112  0.298361 -1.607257 -0.181820  0.577446  0.556841
5  0.903908 -1.379358  0.361620  1.290646 -0.523404 -0.518992

So, I would probably need to sneak an any() somewhere in there, however, I am not sure how to do that. Any hints?

Tough call. Selected @W-B's answer because it was first.

n1000
– n1000

2018-12-21 16:16:13 +00:00
Commented Dec 21, 2018 at 16:16 — n1000
– n1000, Commented Dec 21, 2018 at 16:16

zyxue · Accepted Answer · 2018-12-21 16:05:40Z

5

Another version using get_level_values

df[(df.iloc[:, df.columns.get_level_values(1) == 'c'] > 0).any(axis=1)]

answered Dec 21, 2018 at 16:05

zyxue

9,1406 gold badges63 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2018-12-21 16:01:34Z

1

You are looking for any

df[(df.loc[:,idx[:,['c']]]>0).any(axis = 1)]
Out[133]: 
          A                             B                    
          a         b         c         a         b         c
1 -0.423313  0.459464 -1.457655 -0.559667 -0.056230  1.338850
3 -0.072396  1.305868 -1.239441 -0.708834  0.348704  0.260532
4 -1.415575  1.229508  0.148254 -0.812806  1.379552 -1.195062
5 -0.336973 -0.469335  1.345719  0.847943  1.465100 -0.285792

answered Dec 21, 2018 at 16:01

BENY

324k22 gold badges176 silver badges250 bronze badges

3 Comments

n1000 Over a year ago

Another one of those questions that make me look stupid :D Thanks!! Your answer works perfectly, but column 0 seems to be missing.

BENY Over a year ago

@n1000 it is random value . so , my input may different than you , . :-)

BENY Over a year ago

@n1000 But still happy coding :-) It is Friday !

Collectives™ on Stack Overflow

Conditional selecting rows in pandas DataFrame with MultiIndex

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related