2

So, I have a MultiIndex DataFrame and I cannot figure out row to modify a row index value.

In this example, I would like to set c = 1 where the "a" index is 4:

import pandas as pd
import numpy as np

df = pd.DataFrame({('colA', 'x1'): {(1, np.nan, 0): np.nan, (4, np.nan, 0): np.nan},
('colA', 'x2'): {(1, np.nan, 0): np.nan, (4, np.nan, 0): np.nan},
('colA', 'x3'): {(1, np.nan, 0): np.nan, (4, np.nan, 0): np.nan},
('colA', 'x4'): {(1, np.nan, 0): np.nan, (4, np.nan, 0): np.nan}})

df.index.set_names(['a', 'b', 'c'], inplace=True)

print(df)


            colA
              x1    x2  x3  x4
a   b   c               
1   NaN 0   NaN NaN NaN NaN
4   NaN 0   NaN NaN NaN NaN

Desired output:

            colA
              x1    x2  x3  x4
a   b   c               
1   NaN 0   NaN NaN NaN NaN
4   NaN 1   NaN NaN NaN NaN

Any help is appreciated.

7
  • pandas[df.index['a'] == 4] = 1 ? maybe/ Commented May 14, 2020 at 20:12
  • Not sure what you meant by "pandas" Commented May 14, 2020 at 20:14
  • oops pandas should be df[df.index['a'] == 4] = 1\ Commented May 14, 2020 at 20:20
  • 1
    This will give IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices Commented May 14, 2020 at 20:28
  • 1
    hmmm sorry its a bit harder than i expected... I dont have time to solve it right now ... but here is how you can get the mask to use mask = df.index.get_level_values('a') == 4 Commented May 14, 2020 at 20:44

2 Answers 2

3

Assuming we start with df.

x = df.reset_index()
x.loc[x[x.a == 4].index, 'c'] = 1
x = x.set_index(['a', 'b', 'c'])
print(x)

        colA            
          x1  x2  x3  x4
a b   c                 
1 NaN 0  NaN NaN NaN NaN
4 NaN 1  NaN NaN NaN NaN
Sign up to request clarification or add additional context in comments.

3 Comments

I'm guessing there's no way to do it directly, without reseting the index, right? BC I'm working with a big dataframe in reality
resetting the index doesn't change the order of the data and the size of the data doesn't really matter.
big is pretty fluid.. how do you define big?
2

Solution

Separate the index, process it and put it back together with the data.

Logic

  1. Separate index and process it as a dataframe
  2. Prepare a MultiIndex
  3. Either of the following two options:
    • combine data and MultiIndex together Method-1
    • update the index of the original dataframe Method-2

Code

# separate the index and process it
names = ['a', 'b', 'c'] # same as df.index.names
#dfd = pd.DataFrame(df.to_records())
dfd = df.index.to_frame().reset_index(drop=True)
dfd.loc[dfd['a']==4, ['c']] = 1

# prepare index for original dataframe: df
index = pd.MultiIndex.from_tuples([tuple(x) for x in dfd.loc[:, names].values], names=names)

## Method-1
# create new datframe with updated index
dfn = pd.DataFrame(df.values, index=index, columns=df.columns)
# dfn --> new dataframe

## Method-2
# reset the index of original dataframe df
df.set_index(index)

Output:

            colA            
              x1  x2  x3  x4
a   b   c                   
1.0 NaN 0.0  NaN NaN NaN NaN
4.0 NaN 1.0  NaN NaN NaN NaN

Dummy Data

import pandas as pd
import numpy as np

df = pd.DataFrame({('colA', 'x1'): {(1, np.nan, 0): np.nan, (4, np.nan, 0): np.nan},
('colA', 'x2'): {(1, np.nan, 0): np.nan, (4, np.nan, 0): np.nan},
('colA', 'x3'): {(1, np.nan, 0): np.nan, (4, np.nan, 0): np.nan},
('colA', 'x4'): {(1, np.nan, 0): np.nan, (4, np.nan, 0): np.nan}})

df.index.set_names(['a', 'b', 'c'], inplace=True)

2 Comments

@peter_b Here's another option.
I can see this certainly works but it really feels like an overkill!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.