3

I have a pandas dataframe that looks like this:

import pandas as pd
import numpy as np

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
      np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8,4),index=arrays,columns=['A','B','C','D'])

I want to add a column E such that df.loc[(slice(None),'one'),'E'] = 1 and df.loc[(slice(None),'two'),'E'] = 2, and I want to do this without iterating over ['one', 'two']. I tried the following:

df.loc[(slice(None),slice('one','two')),'E'] = pd.Series([1,2],index=['one','two'])

but it just adds a column E with NaN. What's the right way to do this?

3 Answers 3

2

Here is one way reindex

df.loc[:,'E']=pd.Series([1,2],index=['one','two']).reindex(df.index.get_level_values(1)).values
df
                A         B         C         D  E
bar one -0.856175 -0.383711 -0.646510  0.110204  1
    two  1.640114  0.099713  0.406629  0.774960  2
baz one  0.097198 -0.814920  0.234416 -0.057340  1
    two -0.155276  0.788130  0.761469  0.770709  2
foo one  1.593564 -1.048519 -1.194868  0.191314  1
    two -0.755624  0.678036 -0.899805  1.070639  2
qux one -0.560672  0.317915 -0.858048  0.418655  1
    two  1.198208  0.662354 -1.353606 -0.184258  2
Sign up to request clarification or add additional context in comments.

Comments

1

Methinks this is a good use case for Index.map:

df['E'] = df.index.get_level_values(1).map({'one':1, 'two':2})
df

                A         B         C         D  E
bar one  0.956122 -0.705841  1.192686 -0.237942  1
    two  1.155288  0.438166  1.122328 -0.997020  2
baz one -0.106794  1.451429 -0.618037 -2.037201  1
    two -1.942589 -2.506441 -2.114164 -0.411639  2
foo one  1.278528 -0.442229  0.323527 -0.109991  1
    two  0.008549 -0.168199 -0.174180  0.461164  2
qux one -1.175983  1.010127  0.920018 -0.195057  1
    two  0.805393 -0.701344 -0.537223  0.156264  2

Comments

0

You can just get it from df.index.labels:

df['E'] = df.index.labels[1] + 1
print(df)

Output:

                A         B         C         D  E
bar one  0.746123  1.264906  0.169694 -0.180074  1
    two -1.439730 -0.100075  0.929750  0.511201  2
baz one  0.833037  1.547624 -1.116807  0.425093  1
    two  0.969887 -0.705240 -2.100482  0.728977  2
foo one -0.977623 -0.800136 -0.361394  0.396451  1
    two  1.158378 -1.892137 -0.987366 -0.081511  2
qux one  0.155531  0.275015  0.571397 -0.663358  1
    two  0.710313 -0.255876  0.420092 -0.116537  2

Thanks to coldspeed, if you want different values (i.e x and y), use:

df['E'] = pd.Series(df.index.labels[1]).map({0: 'x', 1: 'y'}).tolist()
print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.