16

I'm a newbie to both Python and Pandas.

I am trying to construct a dataframe, and then later populate it with values.

I have constructed my dataframe

from pandas import *

ageMin = 21
ageMax = 31
ageStep = 2

bins_sumins = [0, 10000, 20000]
bins_age = list(range(ageMin, ageMax, ageStep))
indeks_sex = ['M', 'F']
indeks_age  =  ['[{0}-{1})'.format(bins_age[i-1], bins_age[i]) for i in range(1, len(bins_age))]
indeks_sumins = ['[{0}-{1})'.format(bins_sumins[i-1], bins_sumins[i]) for i in range(1, len(bins_sumins))]
indeks = MultiIndex.from_product([indeks_age, indeks_sex, indeks_sumins], names=['Age', 'Sex', 'Sumins'])

cols = ['A', 'B', 'C', 'D']

df = DataFrame(data = 0, index = indeks, columns = cols)

So far all is well. I am able to assign value to a whole set of values

>>> df['A']['[21-23)']['M'] = 1
>>> df
                           A  B  C  D
Age     Sex Sumins                   
[21-23) M   [0-10000)      1  0  0  0
            [10000-20000)  1  0  0  0
        F   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
[23-25) M   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
        F   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
[25-27) M   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
        F   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
[27-29) M   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
        F   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0

however, setting the value of one position only is a no go...

>>> df['B']['[21-23)']['M']['[10000-20000)'] = 2
>>> df
                           A  B  C  D
Age     Sex Sumins                   
[21-23) M   [0-10000)      1  0  0  0
            [10000-20000)  1  0  0  0
        F   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
[23-25) M   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
        F   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
[25-27) M   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
        F   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
[27-29) M   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
        F   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
[16 rows x 4 columns]

What is going on here? I am open to the idea that i have completely misunderstood how multiindexing works. Anyone?

2
  • 2
    You're doing a chained assignment. You should use loc. check out the indexing docs. pandas.pydata.org/pandas-docs/stable/indexing.html Commented Apr 16, 2014 at 12:59
  • Thank you. I have browsed the documents you linked to and it certainly shed some light on the issue. Commented Apr 18, 2014 at 9:29

1 Answer 1

13

First off, have a look at the docs on chained indexing

Second, read this about needing to sort MultiIndices.

That will get you to this solution:

In [46]: df = df.sort_index()

In [47]: df.loc['[21-23)', 'M', '[10000-20000)'] = 2

In [48]: df
Out[48]: 
                           A  B  C  D
Age     Sex Sumins                   
[21-23) F   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
        M   [0-10000)      0  0  0  0
            [10000-20000)  2  2  2  2
[23-25) F   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
        M   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
[25-27) F   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
        M   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
[27-29) F   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0
        M   [0-10000)      0  0  0  0
            [10000-20000)  0  0  0  0

[16 rows x 4 columns]

pandas .14 will have some additional ways for slicing a MultiIndex.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your quick reply. I did a bit more investigating, reading up on your suggested links. In order to set the value of only ONE element (i.e. for a specific column in a specific row), first pick out the relevant series from the data frame (df['A']), and then use loc on that series -> df['A'].loc['[21-23)', 'M', '[10000-20000)'].
Thank you @mortysporty for this comment!!! You should answer your own question explaining this.
all links not found :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.