2

I have a DataFrame like this:

df = pd.DataFrame(np.random.randn(6, 6),
                  columns=pd.MultiIndex.from_arrays((['A','A','A','B','B','B'], 
                                                     ['a', 'b', 'c', 'a', 'b', 'c'])))
df
          A                             B                    
          a         b         c         a         b         c
0 -0.089902 -2.235642  0.282761  0.725579  1.266029 -0.354892
1 -1.753303  1.092057  0.484323  1.789094 -0.316307  0.416002
2 -0.409028 -0.920366 -0.396802 -0.569926 -0.538649 -0.844967
3  1.789569 -0.935632  0.004476 -1.873532 -1.136138 -0.867943
4  0.244112  0.298361 -1.607257 -0.181820  0.577446  0.556841
5  0.903908 -1.379358  0.361620  1.290646 -0.523404 -0.518992

I would like to select only the rows that have a value larger than 0 in column c. I figured that I will have to use pd.IndexSlice to select only the second level index c.

idx = pd.IndexSlice
df.loc[:,idx[:,['c']]] > 0
       A      B
       c      c
0   True  False
1   True   True
2  False  False
3   True  False
4  False   True
5   True  False

So, now I would expect that I could simply do df[df.loc[:,idx[:,['c']]] > 0], however that gives me an unexpected result:

df[df.loc[:,idx[:,['c']]] > 0]
    A                 B              
    a   b         c   a   b         c
0 NaN NaN  0.282761 NaN NaN       NaN
1 NaN NaN  0.484323 NaN NaN  0.416002
2 NaN NaN       NaN NaN NaN       NaN
3 NaN NaN  0.004476 NaN NaN       NaN
4 NaN NaN       NaN NaN NaN  0.556841
5 NaN NaN  0.361620 NaN NaN       NaN

What I would have liked to have is all values (not NaNs) and only the rows where any of the c-columns is greater 0.

          A                             B                    
          a         b         c         a         b         c
0 -0.089902 -2.235642  0.282761  0.725579  1.266029 -0.354892
1 -1.753303  1.092057  0.484323  1.789094 -0.316307  0.416002
3  1.789569 -0.935632  0.004476 -1.873532 -1.136138 -0.867943
4  0.244112  0.298361 -1.607257 -0.181820  0.577446  0.556841
5  0.903908 -1.379358  0.361620  1.290646 -0.523404 -0.518992

So, I would probably need to sneak an any() somewhere in there, however, I am not sure how to do that. Any hints?

1
  • Tough call. Selected @W-B's answer because it was first. Commented Dec 21, 2018 at 16:16

2 Answers 2

5

Another version using get_level_values

df[(df.iloc[:, df.columns.get_level_values(1) == 'c'] > 0).any(axis=1)]

Sign up to request clarification or add additional context in comments.

Comments

1

You are looking for any

df[(df.loc[:,idx[:,['c']]]>0).any(axis = 1)]
Out[133]: 
          A                             B                    
          a         b         c         a         b         c
1 -0.423313  0.459464 -1.457655 -0.559667 -0.056230  1.338850
3 -0.072396  1.305868 -1.239441 -0.708834  0.348704  0.260532
4 -1.415575  1.229508  0.148254 -0.812806  1.379552 -1.195062
5 -0.336973 -0.469335  1.345719  0.847943  1.465100 -0.285792

3 Comments

Another one of those questions that make me look stupid :D Thanks!! Your answer works perfectly, but column 0 seems to be missing.
@n1000 it is random value . so , my input may different than you , . :-)
@n1000 But still happy coding :-) It is Friday !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.