String replace within index/MultiIndex

Question

I have a MultiIndexed DataFrame with codes for countries as follows:

In [3]: idx = pd.MultiIndex.from_tuples([('AUS', 'a'), ('AUS', 'b'), ('BRA', 'a')])

In [4]: idx.names = ['country', 'foo']
In [5]: df = pd.DataFrame([4,5,6], index=idx)
In [6]: df
Out[6]: 
             0
country foo   
AUS     a    4
        b    5
BRA     a    6

I also have a dictionary with values to replace my codes with:

In [7]: codes = dict(AUS='Australia', BRA='Brazil')

I'd like to do the equivalent of df.replace(codes) but on the index levels (either all levels, or a specific one, I don't mind)

The output would look like:

               0
country   foo
Australia a    4
          b    5
Brazil    a    6

I'm currently doing it in a very silly way indeed:

In [21]: replaced = [pd.Series(df.index.get_level_values(i)).replace(codes) for i in range(len(df.index.levels))]
In [22]: replaced_tuples = zip(*replaced)
In [23]: new_idx = pd.MultiIndex.from_tuples(replaced_tuples)
In [27]: df_replaced = pd.DataFrame(df.values, index=new_idx)
In [28]: df_replaced
Out[28]: 
             0
Australia a  4
          b  5
Brazil    a  6

What's the much nicer way that's staring me in the face? (Note that this method doesn't even preserve level names so it's all-round bad.)

EdChum · Accepted Answer · 2015-07-24 14:21:05Z

3

You can call set_levels on the multi-index and pass the new names, you have to create a list that is the same order as your level names due to dicts not guaranteeing order:

In [380]:
country_code_list = [codes[x] for x in df.index.get_level_values(0).unique()]
df.index.set_levels(country_code_list, level='country', inplace=True)
df

Out[380]:
               0
country   foo   
Australia a    4
          b    5
Brazil    a    6

answered Jul 24, 2015 at 14:21

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

LondonRob · Accepted Answer · 2015-07-24 14:54:02Z

1

Here's a reasonable-looking way to do it. Not sure how it compares efficiency/readability-wise against @EdChum's answer:

In [46]: df.reset_index().replace(codes).set_index(df.index.names)
Out[46]: 
               0
country   foo   
Australia a    4
          b    5
Brazil    a    6

Clearly the downside here is that the replace will replace throughout the DataFrame, not just in the index columns.

On the upside, doing it this way, you get access to all the functionality of replace like regular expressions.

If you really care about only replacing within the index you can do either:

codes_dict = dict(country=codes)

or

codes_dict = {k: codes for k in df.index.names}

Then finally change codes for codes_dict in the call to replace. Perfect!

edited Jul 24, 2015 at 14:54

answered Jul 24, 2015 at 14:28

LondonRob

79.8k43 gold badges161 silver badges224 bronze badges

Collectives™ on Stack Overflow

String replace within index/MultiIndex

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related