23

I'm sensing some weird pandas behavior here. I have a dataframe that looks like

df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
                  index=[('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')])

In [14]: df
Out[14]:
       Col 1 Col 2 Col 3
(1, a)   NaN   NaN   NaN
(2, a)   NaN   NaN   NaN
(1, b)   NaN   NaN   NaN
(2, b)   NaN   NaN   NaN

I can set the value of an arbitrary element

In [15]: df['Col 2'].loc[('1', 'b')] = 6

In [16]: df
Out[16]:
       Col 1 Col 2 Col 3
(1, a)   NaN   NaN   NaN
(2, a)   NaN   NaN   NaN
(1, b)   NaN     6   NaN
(2, b)   NaN   NaN   NaN

But when I go to reference the element that I just set using the same syntax, I get

In [17]: df['Col 2'].loc[('1', 'b')]
KeyError: 'the label [1] is not in the [index]'

Can someone tell me what I'm doing wrong or why this behavior occurs? Am I simply not allowed to set the index as a multi-element tuple?

Edit

Apparently, wrapping the tuple index in a list works.

In [38]: df['Col 2'].loc[[('1', 'b')]]
Out[38]:
(1, b)    6
Name: Col 2, dtype: object

Although I'm still getting some weird behavior in my actual use case so it'd be nice to know if this is not recommended usage.

2
  • 1
    The response in this question suggests it's not recommended usage cause of ambiguity between tuple keys and MultiIndex selection. Commented Oct 21, 2016 at 23:15
  • wrapping the tuple index in a list worked for me Commented Aug 30, 2022 at 11:02

1 Answer 1

23

Your tuple in the selection brackets is seen as a sequence containing the elements you want to retrieve. It's like you would have passed ['1', 'b'] as argument. Thus the KeyError message: pandas tries to find the key '1' and obviously doesn't find it.

That's why it works when you add additional brackets, as now the argument becomes a sequence of one element - your tuple.

You should avoid dealing with ambiguities around list and tuple arguments in selection. The behavior can be also different depending on the index being a simple index or a multiindex.

In any case, if you ask about recommendations here, the one I see is that you should try to not build simple indexes made of tuples: pandas will work better and will be more powerful to use if you actually build a multiindex instead:

df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
                  index=pd.MultiIndex.from_tuples([('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')]))

df['Col 2'].loc[('1', 'b')] = 6

df['Col 2'].loc[('1', 'b')]
Out[13]: 6

df
Out[14]: 
    Col 1 Col 2 Col 3
1 a   NaN   NaN   NaN
2 a   NaN   NaN   NaN
1 b   NaN     6   NaN
2 b   NaN   NaN   NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.