Select rows and subrows in pandas multiindex dataframe by condition

Question

I have a multi-indexed dataframe which contains some NaN values inside its index and rows.

In:

import pandas as pd
import numpy as np

row1 = {'index1' : 'abc', 'col1' : 'some_value', 'col3' : True}
row2 = {'index2' : 'xyz', 'col2' : 'other_value', 'col3' : np.nan}
row3 = {'index1' : 'def', 'col1' : 'different_value', 'col3' : False}
row4 = {'index2' : 'uvw', 'col2' : 'same_value', 'col3' : np.nan}
df = pd.DataFrame([row1, row2, row3, row4])

df.set_index(['index1', 'index2'], inplace=True)

print(df)

Out:

                          col1         col2   col3
index1 index2                                     
abc    NaN          some_value          NaN   True
NaN    xyz                 NaN  other_value    NaN
def    NaN     different_value          NaN  False
NaN    uvw                 NaN   same_value    NaN

Is there a possibility to get a subset of that dataframe by the condition col3 == True which also includes all "subrows" of the row where that condition holds?

When I go for

print(df[df.col3 == True])

I get

                     col1 col2  col3
index1 index2                       
abc    NaN     some_value  NaN  True

which is the row where the condition holds. However, what I am looking for is

                     col1         col2  col3
index1 index2                       
abc    NaN     some_value         NaN   True
NaN    xyz            NaN  other value  NaN

, including the row which does not have the True value itself but is a "subrow" of the row with index1 == abc.

Is that possible? Or is the dataframe messed up and should be structured in a different way?

elyase · Accepted Answer · 2014-12-10 18:59:55Z

1

A simple solution would be to just use a condition on the padded col3 where the NaNs are replaced with the value of the row they belong to. For example:

>>> df['col3'].fillna(method='pad')

index1  index2
abc     NaN        True
NaN     xyz        True
def     NaN       False
NaN     uvw       False
Name: col3, dtype: bool

Now you can apply the condition like this:

>>> df[df['col3'].fillna(method='pad')]

                col1       col2         col3
index1  index2          
abc     NaN     some_value NaN          True
NaN     xyz     NaN        other_value  NaN

edited Dec 10, 2014 at 18:59

answered Dec 10, 2014 at 18:53

elyase

41.2k12 gold badges121 silver badges123 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dirk Over a year ago

Yeah! I still feel like doing something that is not intended to be done, but this works for my purpose.

elyase Over a year ago

Yep, the way you are structuring your data looks weird. I think that you should use the same index1 for all subrows it would all look more explicit and then you could groupby/filter by index1. This is how I would do it. It would't be much more efficient than the current way but it would look better.

Collectives™ on Stack Overflow

Select rows and subrows in pandas multiindex dataframe by condition

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related