Assigning values to Pandas Multiindex DataFrame by index level

Question

I have a Pandas multiindex dataframe and I need to assign values to one of the columns from a series. The series shares its index with the first level of the index of the dataframe.

import pandas as pd
import numpy as np
idx0 = np.array(['bar', 'bar', 'bar', 'baz', 'foo', 'foo'])
idx1 = np.array(['one', 'two', 'three', 'one', 'one', 'two'])
df = pd.DataFrame(index = [idx0, idx1], columns = ['A', 'B'])
s = pd.Series([True, False, True],index = np.unique(idx0))
print df
print s

out:

             A    B
bar one    NaN  NaN
    two    NaN  NaN
    three  NaN  NaN
baz one    NaN  NaN
foo one    NaN  NaN
    two    NaN  NaN

bar     True
baz    False
foo     True
dtype: bool

These don't work:

df.A = s # does not raise an error, but does nothing
df.loc[s.index,'A'] = s # raises an error

expected output:

             A     B
bar one    True   NaN
    two    True   NaN
    three  True   NaN
baz one    False  NaN
foo one    True   NaN
    two    True   NaN

JohnE · Accepted Answer · 2017-10-03 22:51:52Z

9

Series (and dictionaries) can be used just like functions with map and apply (thanks to @normanius for improving the syntax):

df['A'] = pd.Series(df.index.get_level_values(0)).map(s).values

Or similarly:

df['A'] = df.reset_index(level=0)['level_0'].map(s).values

Results:

A    B
bar one     True  NaN
    two     True  NaN
    three   True  NaN
baz one    False  NaN
foo one     True  NaN
    two     True  NaN

edited Oct 3, 2017 at 22:51

answered May 8, 2015 at 12:51

JohnE

30.7k9 gold badges86 silver badges116 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

EdChum Over a year ago

I do wonder if this is a bug that it doesn't work if the values passed have index values that can be aligned, anyway +1

EdChum Over a year ago

I couldn't figure out what the syntax should be using .loc either to assign the values, hopefully a better pandas person will appear to answer that. To me this should just work so there must be a way of doing this without resorting to map

JohnE Over a year ago

Oh, I thought you were referring to something else. I figure map is as good of a way to do this as any. Could also do via merge, but I suspect that's a little slower (but maybe clearer to read).

normanius Over a year ago

@JohnE: I'd suggest to write df['A'] = pd.Series(df.index.get_level_values(0)).map(s).values, this is robust and a bit clearer than your example.

JohnE Over a year ago

@normanius Thanks! I don't even remember answering this question but agree completely with your comment and have edited to include your suggestion.

|

keepAlive · Accepted Answer · 2021-07-31 15:15:19Z

2

df.A = s does not raise an error, but does nothing

Indeed this should have worked.^{Your point is actually related to mine.}

ᐊᐊ The workaround ᐊᐊ

>>> s.index = pd.Index((c,) for c in s.index)  # ᐊᐊᐊᐊᐊᐊᐊᐊ
>>> df.A = s
>>> df
               A    B
bar one     True  NaN
    two     True  NaN
    three   True  NaN
baz one    False  NaN
foo one     True  NaN
    two     True  NaN

Why does the above work ?

Because when you do directly df.A = s without the workaround, you are actually trying to assign pandas.Index-contained coordinates within a subclass instance,^{which somehow looks like a "counter-opposition" to the LS principle} i.e. an instance of pandas.MultiIndex. I mean, look for yourself:

>>> type(s.index).__name__
'Index'

whereas

>>> type(df.index).__name__
'MultiIndex'

Hence this workaround that consists in turning s's index into a 1-dimensional pandas.MultiIndex instance.

>>> s.index = pd.Index((c,) for c in s.index)
>>> type(s.index).__name__
'MultiIndex'

and nothing has perceptibly changed

>>> s
bar     True
baz    False
foo     True
dtype: bool

A thought: From many views (mathematical, ontological) all this somehow shows that pandas.Index should have been designed as a subclass of pandas.MultiIndex, not the opposite, as it is currently.

answered Jul 31, 2021 at 15:15

keepAlive

6,7055 gold badges29 silver badges43 bronze badges

3 Comments

keepAlive Over a year ago

@EdChum the above workaround may give you an idea of the type of bug that is currently at work.

danek Dec 5, 2024 at 5:32

The thought at the end is truely bright, I wonder if anyone suggested this improvement already

keepAlive Dec 6, 2024 at 20:09

Good question @danek. I guess that pandas is in a too far develpment stage for it to be redesigned at such a fundamental level. But, TBH, i dunno.

ppt000 · Accepted Answer · 2023-01-04 09:43:27Z

You can use the join method on the df DataFrame, but you need to name the indexes and the series accordingly:

>>> df.index.names = ('lvl0', 'lvl1')
>>> s.index.name = 'lvl0'
>>> s.name = 'new_col'

Then the join method creates a new column in the DataFrame:

>>> df.join(s)
              A    B  new_col
lvl0 lvl1
bar  one    NaN  NaN     True
     two    NaN  NaN     True
     three  NaN  NaN     True
baz  one    NaN  NaN    False
foo  one    NaN  NaN     True
     two    NaN  NaN     True

To assign it to an existing column:

>>> df['A'] = df.join(s)['new_col']
>>> df
                A    B
lvl0 lvl1
bar  one     True  NaN
     two     True  NaN
     three   True  NaN
baz  one    False  NaN
foo  one     True  NaN
     two     True  NaN

Collectives™ on Stack Overflow

Assigning values to Pandas Multiindex DataFrame by index level

3 Answers 3

6 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related