0

I have a multi-indexed dataframe which contains some NaN values inside its index and rows.

In:

import pandas as pd
import numpy as np

row1 = {'index1' : 'abc', 'col1' : 'some_value', 'col3' : True}
row2 = {'index2' : 'xyz', 'col2' : 'other_value', 'col3' : np.nan}
row3 = {'index1' : 'def', 'col1' : 'different_value', 'col3' : False}
row4 = {'index2' : 'uvw', 'col2' : 'same_value', 'col3' : np.nan}
df = pd.DataFrame([row1, row2, row3, row4])

df.set_index(['index1', 'index2'], inplace=True)

print(df)

Out:

                          col1         col2   col3
index1 index2                                     
abc    NaN          some_value          NaN   True
NaN    xyz                 NaN  other_value    NaN
def    NaN     different_value          NaN  False
NaN    uvw                 NaN   same_value    NaN

Is there a possibility to get a subset of that dataframe by the condition col3 == True which also includes all "subrows" of the row where that condition holds?

When I go for

print(df[df.col3 == True])

I get

                     col1 col2  col3
index1 index2                       
abc    NaN     some_value  NaN  True

which is the row where the condition holds. However, what I am looking for is

                     col1         col2  col3
index1 index2                       
abc    NaN     some_value         NaN   True
NaN    xyz            NaN  other value  NaN    

, including the row which does not have the True value itself but is a "subrow" of the row with index1 == abc.

Is that possible? Or is the dataframe messed up and should be structured in a different way?

0

1 Answer 1

1

A simple solution would be to just use a condition on the padded col3 where the NaNs are replaced with the value of the row they belong to. For example:

>>> df['col3'].fillna(method='pad')

index1  index2
abc     NaN        True
NaN     xyz        True
def     NaN       False
NaN     uvw       False
Name: col3, dtype: bool

Now you can apply the condition like this:

>>> df[df['col3'].fillna(method='pad')]

                col1       col2         col3
index1  index2          
abc     NaN     some_value NaN          True
NaN     xyz     NaN        other_value  NaN
Sign up to request clarification or add additional context in comments.

2 Comments

Yeah! I still feel like doing something that is not intended to be done, but this works for my purpose.
Yep, the way you are structuring your data looks weird. I think that you should use the same index1 for all subrows it would all look more explicit and then you could groupby/filter by index1. This is how I would do it. It would't be much more efficient than the current way but it would look better.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.