5

There are already a couple of questions on SO relating to this, most notably this one, however none of the answers seem to work for me and quite a few links to docs (especially on lexsorting) are broken, so I'll ask another one.

I'm trying do to something (seemingly) very simple. Consider the following MultiIndexed Dataframe:

import pandas as pd; import random
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
      ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.concat([pd.Series(np.random.randn(8), index=index), pd.Series(np.random.randn(8), index=index)], axis=1)

Now I want to set all values in column 0 to some value (say np.NaN) for the observations in category one. I've failed with:

df.loc(axis=0)[:, "one"][0] = 1 # setting with copy warning

and

df.loc(axis=0)[:, "one", 0] = 1

which either yields a warning about length of keys exceeding length of index, or one about a lack of lexsorting to sufficient depth.

What is the correct way to do this?

1 Answer 1

5

I think you can use loc with tuple for selecting MultiIndex and 0 for selecting column:

import pandas as pd; 
import random
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
      ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

#add for testing
np.random.seed(0)
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.concat([pd.Series(np.random.randn(8), index=index), pd.Series(np.random.randn(8), index=index)], axis=1)
print df
                     0         1
first second                    
bar   one     1.764052 -0.103219
      two     0.400157  0.410599
baz   one     0.978738  0.144044
      two     2.240893  1.454274
foo   one     1.867558  0.761038
      two    -0.977278  0.121675
qux   one     0.950088  0.443863
      two    -0.151357  0.333674

df.loc[('bar', "one"), 0] = 1
print df
                     0         1
first second                    
bar   one     1.000000 -0.103219
      two     0.400157  0.410599
baz   one     0.978738  0.144044
      two     2.240893  1.454274
foo   one     1.867558  0.761038
      two    -0.977278  0.121675
qux   one     0.950088  0.443863
      two    -0.151357  0.333674

If you need set all rows in level second with value one use slice(None):

df.loc[(slice(None), "one"), 0] = 1
print df
                     0         1
first second                    
bar   one     1.000000 -0.103219
      two     0.400157  0.410599
baz   one     1.000000  0.144044
      two     2.240893  1.454274
foo   one     1.000000  0.761038
      two    -0.977278  0.121675
qux   one     1.000000  0.443863
      two    -0.151357  0.333674

Docs.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.