2

I have a MultiIndexed DataFrame with codes for countries as follows:

In [3]: idx = pd.MultiIndex.from_tuples([('AUS', 'a'), ('AUS', 'b'), ('BRA', 'a')])

In [4]: idx.names = ['country', 'foo']
In [5]: df = pd.DataFrame([4,5,6], index=idx)
In [6]: df
Out[6]: 
             0
country foo   
AUS     a    4
        b    5
BRA     a    6

I also have a dictionary with values to replace my codes with:

In [7]: codes = dict(AUS='Australia', BRA='Brazil')

I'd like to do the equivalent of df.replace(codes) but on the index levels (either all levels, or a specific one, I don't mind)

The output would look like:

               0
country   foo
Australia a    4
          b    5
Brazil    a    6

I'm currently doing it in a very silly way indeed:

In [21]: replaced = [pd.Series(df.index.get_level_values(i)).replace(codes) for i in range(len(df.index.levels))]
In [22]: replaced_tuples = zip(*replaced)
In [23]: new_idx = pd.MultiIndex.from_tuples(replaced_tuples)
In [27]: df_replaced = pd.DataFrame(df.values, index=new_idx)
In [28]: df_replaced
Out[28]: 
             0
Australia a  4
          b  5
Brazil    a  6

What's the much nicer way that's staring me in the face? (Note that this method doesn't even preserve level names so it's all-round bad.)

2 Answers 2

3

You can call set_levels on the multi-index and pass the new names, you have to create a list that is the same order as your level names due to dicts not guaranteeing order:

In [380]:
country_code_list = [codes[x] for x in df.index.get_level_values(0).unique()]
df.index.set_levels(country_code_list, level='country', inplace=True)
df

Out[380]:
               0
country   foo   
Australia a    4
          b    5
Brazil    a    6
Sign up to request clarification or add additional context in comments.

Comments

1

Here's a reasonable-looking way to do it. Not sure how it compares efficiency/readability-wise against @EdChum's answer:

In [46]: df.reset_index().replace(codes).set_index(df.index.names)
Out[46]: 
               0
country   foo   
Australia a    4
          b    5
Brazil    a    6

Clearly the downside here is that the replace will replace throughout the DataFrame, not just in the index columns.

On the upside, doing it this way, you get access to all the functionality of replace like regular expressions.

If you really care about only replacing within the index you can do either:

codes_dict = dict(country=codes)

or

codes_dict = {k: codes for k in df.index.names}

Then finally change codes for codes_dict in the call to replace. Perfect!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.