pandas multiindex assignment from another dataframe

Question

I am trying to understand pandas MultiIndex DataFrames and how to assign data to them. Specifically I'm interested in assigning entire blocks that match another smaller data frame.

ix = pd.MultiIndex.from_product([['A', 'B'], ['a', 'b', 'c', 'd']])
df = pd.DataFrame(index=ix, columns=['1st', '2nd', '3rd'], dtype=np.float64)
df_ = pd.DataFrame(index=['a', 'b', 'c', 'd'], columns=['1st', '2nd', '3rd'], data=np.random.rand(4, 3))
df_

    1st     2nd     3rd
a   0.730251    0.468134    0.876926
b   0.104990    0.082461    0.129083
c   0.993608    0.117799    0.341811
d   0.784950    0.840145    0.016777

df is the same except that all the values are NaN and there are two blocks A and B. Now if I want to assign the values from df_ to df I would imagine I can do something like

df.loc['A',:] = df_                # Runs, does not work
df.loc[('A','a'):('A','d')] = df_  # AssertionError (??) 'Start slice bound is non-scalar'
df.loc[('A','a'):('A','d')]        # No AssertionError (??)

idx = pd.IndexSlice
df.loc[idx['A', :]] = df_          # Runs, does not work

None of these work, they leave all the values in df as NaN, although df.loc[idx['A', :]] gives me a slice of the data frame that exactly matches that of the sub frame (df_). So is this a case of setting values on a view? Explicitly iterating over the index in df_ works

# this is fine
for v in df_.index:
    df.loc[idx['A', v]] = df_.loc[v]

# this is also fine
for v in df_.index:
    df.loc['A', v] = df_.loc[v]

Is it even possible to assign whole blocks like this (sort of like NumPy)? If not, that's fine, I am simply trying to understand how the system works.

There's a related question about index slicers, but it's about assigning a single value to a masked portion of the DataFrame, not about assigning blocks. Pandas : Proper way to set values based on condition for subset of multiindex dataframe

unutbu · Accepted Answer · 2015-02-10 14:08:58Z

41

When you use

df.loc['A', :] = df_

Pandas tries to align the index of df_ with the index of a sub-DataFrame of df. However, at the point in the code where alignment is performed, the sub-DataFrame has a MultiIndex, not the single index you see as the result of df.loc['A', :].

So the alignment fails because df_ has a single index, not the MultiIndex that is needed. To see that the index of df_ is indeed the problem, note that

ix_ = pd.MultiIndex.from_product([['A'], ['a', 'b', 'c', 'd']])
df_.index = ix_
df.loc['A', :] = df_
print(df)

succeeds, yielding something like

A a  0.229970  0.730824  0.784356
  b  0.584390  0.628337  0.318222
  c  0.257192  0.624273  0.221279
  d  0.787023  0.056342  0.240735
B a       NaN       NaN       NaN
  b       NaN       NaN       NaN
  c       NaN       NaN       NaN
  d       NaN       NaN       NaN

Of course, you probably do not want to have to create a new MultiIndex every time you want to assign a block of values. So instead, to work around this alignment problem, you can use a NumPy array as the assignment value:

df.loc['A', :] = df_.values

Since df_.values is a NumPy array and an array has no index, no alignment is performed and the assignment yields the same result as above. This trick of using a NumPy arrays when you don't want alignment of indexes applies to many situations when using Pandas.

Note also that assignment-by-NumPy-array can also help you perform more complicated assignments such as to rows which are not contiguous:

idx = pd.IndexSlice
df.loc[idx[:,('a','b')], :] = df_.values

yields

In [85]: df
Out[85]: 
          1st       2nd       3rd
A a  0.229970  0.730824  0.784356
  b  0.584390  0.628337  0.318222
  c       NaN       NaN       NaN
  d       NaN       NaN       NaN
B a  0.257192  0.624273  0.221279
  b  0.787023  0.056342  0.240735
  c       NaN       NaN       NaN
  d       NaN       NaN       NaN

for example.

edited Feb 10, 2015 at 14:08

answered Feb 10, 2015 at 13:19

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Matti Lyra Over a year ago

I see that's a good explanation, thanks. I like the df_.values, especially because it allows you to do all kinds of crazy partial assignments. Just need to be careful to index the data frames in the same order, I was wondering why some of my data suddenly flips around (oops).

unutbu Over a year ago

If the order of the values is different then it might be easiest to make the index of df_ a MultiIndex and let Pandas deal with the alignment for you.

Matti Lyra Over a year ago

The order was different because I was being stupid, but I'll keep that in mind.

goweon Dec 9, 2024 at 15:18

wow. that behavior is frustratingly unintuitive and inconvenient

behzad.nouri · Accepted Answer · 2015-02-10 13:28:11Z

3

I did 8480 a while back, which makes sub-frame assignment with columns work. so, you may do as follows as a work-around:

>>> rf
     1st    2nd    3rd
a  0.730  0.468  0.877
b  0.105  0.082  0.129
c  0.994  0.118  0.342
d  0.785  0.840  0.017
>>> df.T['A'] = rf.T  # take transpose of both sides
>>> df
       1st    2nd    3rd
A a  0.730  0.468  0.877
  b  0.105  0.082  0.129
  c  0.994  0.118  0.342
  d  0.785  0.840  0.017
B a    NaN    NaN    NaN
  b    NaN    NaN    NaN
  c    NaN    NaN    NaN
  d    NaN    NaN    NaN

that said, you may want to post this as a bug on github.

edit: seems that adding a dummy slice at the end also works:

>>> df.loc['A'][:] = rf
>>> df
       1st    2nd    3rd
A a  0.730  0.468  0.877
  b  0.105  0.082  0.129
  c  0.994  0.118  0.342
  d  0.785  0.840  0.017
B a    NaN    NaN    NaN
  b    NaN    NaN    NaN
  c    NaN    NaN    NaN
  d    NaN    NaN    NaN

edited Feb 10, 2015 at 13:28

answered Feb 10, 2015 at 13:03

behzad.nouri

78.5k18 gold badges130 silver badges127 bronze badges

2 Comments

Matti Lyra Over a year ago

doesn't the dummy index at the end create a view of the data frame as documented here - I at least get the warning about assigning values to a view

Navaneethan Santhanam Over a year ago

The suggestion after the edit worked for me, thank you!

Collectives™ on Stack Overflow

pandas multiindex assignment from another dataframe

2 Answers 2

4 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related