27

I am trying to understand pandas MultiIndex DataFrames and how to assign data to them. Specifically I'm interested in assigning entire blocks that match another smaller data frame.

ix = pd.MultiIndex.from_product([['A', 'B'], ['a', 'b', 'c', 'd']])
df = pd.DataFrame(index=ix, columns=['1st', '2nd', '3rd'], dtype=np.float64)
df_ = pd.DataFrame(index=['a', 'b', 'c', 'd'], columns=['1st', '2nd', '3rd'], data=np.random.rand(4, 3))
df_

    1st     2nd     3rd
a   0.730251    0.468134    0.876926
b   0.104990    0.082461    0.129083
c   0.993608    0.117799    0.341811
d   0.784950    0.840145    0.016777

df is the same except that all the values are NaN and there are two blocks A and B. Now if I want to assign the values from df_ to df I would imagine I can do something like

df.loc['A',:] = df_                # Runs, does not work
df.loc[('A','a'):('A','d')] = df_  # AssertionError (??) 'Start slice bound is non-scalar'
df.loc[('A','a'):('A','d')]        # No AssertionError (??)

idx = pd.IndexSlice
df.loc[idx['A', :]] = df_          # Runs, does not work

None of these work, they leave all the values in df as NaN, although df.loc[idx['A', :]] gives me a slice of the data frame that exactly matches that of the sub frame (df_). So is this a case of setting values on a view? Explicitly iterating over the index in df_ works

# this is fine
for v in df_.index:
    df.loc[idx['A', v]] = df_.loc[v]

# this is also fine
for v in df_.index:
    df.loc['A', v] = df_.loc[v]

Is it even possible to assign whole blocks like this (sort of like NumPy)? If not, that's fine, I am simply trying to understand how the system works.

There's a related question about index slicers, but it's about assigning a single value to a masked portion of the DataFrame, not about assigning blocks. Pandas : Proper way to set values based on condition for subset of multiindex dataframe

2 Answers 2

41

When you use

df.loc['A', :] = df_

Pandas tries to align the index of df_ with the index of a sub-DataFrame of df. However, at the point in the code where alignment is performed, the sub-DataFrame has a MultiIndex, not the single index you see as the result of df.loc['A', :].

So the alignment fails because df_ has a single index, not the MultiIndex that is needed. To see that the index of df_ is indeed the problem, note that

ix_ = pd.MultiIndex.from_product([['A'], ['a', 'b', 'c', 'd']])
df_.index = ix_
df.loc['A', :] = df_
print(df)

succeeds, yielding something like

A a  0.229970  0.730824  0.784356
  b  0.584390  0.628337  0.318222
  c  0.257192  0.624273  0.221279
  d  0.787023  0.056342  0.240735
B a       NaN       NaN       NaN
  b       NaN       NaN       NaN
  c       NaN       NaN       NaN
  d       NaN       NaN       NaN

Of course, you probably do not want to have to create a new MultiIndex every time you want to assign a block of values. So instead, to work around this alignment problem, you can use a NumPy array as the assignment value:

df.loc['A', :] = df_.values

Since df_.values is a NumPy array and an array has no index, no alignment is performed and the assignment yields the same result as above. This trick of using a NumPy arrays when you don't want alignment of indexes applies to many situations when using Pandas.

Note also that assignment-by-NumPy-array can also help you perform more complicated assignments such as to rows which are not contiguous:

idx = pd.IndexSlice
df.loc[idx[:,('a','b')], :] = df_.values

yields

In [85]: df
Out[85]: 
          1st       2nd       3rd
A a  0.229970  0.730824  0.784356
  b  0.584390  0.628337  0.318222
  c       NaN       NaN       NaN
  d       NaN       NaN       NaN
B a  0.257192  0.624273  0.221279
  b  0.787023  0.056342  0.240735
  c       NaN       NaN       NaN
  d       NaN       NaN       NaN

for example.

Sign up to request clarification or add additional context in comments.

4 Comments

I see that's a good explanation, thanks. I like the df_.values, especially because it allows you to do all kinds of crazy partial assignments. Just need to be careful to index the data frames in the same order, I was wondering why some of my data suddenly flips around (oops).
If the order of the values is different then it might be easiest to make the index of df_ a MultiIndex and let Pandas deal with the alignment for you.
The order was different because I was being stupid, but I'll keep that in mind.
wow. that behavior is frustratingly unintuitive and inconvenient
3

I did 8480 a while back, which makes sub-frame assignment with columns work. so, you may do as follows as a work-around:

>>> rf
     1st    2nd    3rd
a  0.730  0.468  0.877
b  0.105  0.082  0.129
c  0.994  0.118  0.342
d  0.785  0.840  0.017
>>> df.T['A'] = rf.T  # take transpose of both sides
>>> df
       1st    2nd    3rd
A a  0.730  0.468  0.877
  b  0.105  0.082  0.129
  c  0.994  0.118  0.342
  d  0.785  0.840  0.017
B a    NaN    NaN    NaN
  b    NaN    NaN    NaN
  c    NaN    NaN    NaN
  d    NaN    NaN    NaN

that said, you may want to post this as a bug on github.

edit: seems that adding a dummy slice at the end also works:

>>> df.loc['A'][:] = rf
>>> df
       1st    2nd    3rd
A a  0.730  0.468  0.877
  b  0.105  0.082  0.129
  c  0.994  0.118  0.342
  d  0.785  0.840  0.017
B a    NaN    NaN    NaN
  b    NaN    NaN    NaN
  c    NaN    NaN    NaN
  d    NaN    NaN    NaN

2 Comments

doesn't the dummy index at the end create a view of the data frame as documented here - I at least get the warning about assigning values to a view
The suggestion after the edit worked for me, thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.