How to replace a string in a pandas multiindex?

Question

I have a dataframe with a large multiindex, sourced from a vast number of csv files. Some of those files have errors in the various labels, ie. "window" is missspelled as "winZZw", which then causes problems when I select all windows with df.xs('window', level='middle', axis=1).

So I need a way to simply replace winZZw with window.

Here's a very minimal sample df: (lets assume the data and the 'roof', 'window'… strings come from some convoluted text reader)

header = pd.MultiIndex.from_product(['roof', 'window', 'basement'], names = ['top', 'middle', 'bottom'])
dates = pd.date_range('01/01/2000','01/12/2010', freq='MS')
data = np.random.randn(len(dates))
df = pd.DataFrame(data, index=dates, columns=header)
header2 = pd.MultiIndex.from_product(['roof', 'winZZw', 'basement'], names = ['top', 'middle', 'bottom'])
data = 3*(np.random.randn(len(dates)))
df2 = pd.DataFrame(data, index=dates, columns=header2)
df = pd.concat([df, df2], axis=1)
header3 = pd.MultiIndex.from_product(['roof', 'door', 'basement'], names = ['top', 'middle', 'bottom'])
data = 2*(np.random.randn(len(dates)))
df3 = pd.DataFrame(data, index=dates, columns=header3)
df = pd.concat([df, df3], axis=1)

Now I want to xs a new dataframe for all the houses that have a window at their middle level: windf = df.xs('window', level='middle', axis=1)

But this obviously misses the misspelled winZZw.

So, how I replace winZZw with window?

The only way I found was to use set_levels, but if I understood that correctly, I need to feed it the whole level, ie

df.columns.set_levels([u'window',u'window', u'door'], level='middle',inplace=True)

but this has two issues:

I need to pass it the whole index, which is easy in this sample, but impossible/stupid for a thousand column df with hundreds of labels.
It seems to need the list backwards (now, my first entry in the df has door in the middle, instead of the window it had). That can probably be fixed, but it seems weird

I can work around these issues by xsing a new df of only winZZws, and then setting the levels with set_levels(df.shape[1]*[u'window'], level='middle') and then concatting it together again, but I'd like to have something more straightforward analog to str.replace('winZZw', 'window'), but I can't figure out how.

it seems the code contains error, please check first. MultiIndex.from_product needs list of list as input. — doraemon
– doraemon, Commented Jul 11, 2018 at 8:40
In @jezraels answer, he changed ['roof', 'window', 'basement'] to [['roof'],[ 'window'], ['basement']] to make it work. So perhaps you are using a pandas that is too old. — doraemon
– doraemon, Commented Jul 11, 2018 at 8:57

jezrael · Accepted Answer · 2018-07-11 08:42:31Z

2

Use rename with specifying level:

header = pd.MultiIndex.from_product([['roof'],[ 'window'], ['basement']], names = ['top', 'middle', 'bottom'])
dates = pd.date_range('01/01/2000','01/12/2010', freq='MS')
data = np.random.randn(len(dates))
df = pd.DataFrame(data, index=dates, columns=header)
header2 = pd.MultiIndex.from_product([['roof'], ['winZZw'], ['basement']], names = ['top', 'middle', 'bottom'])
data = 3*(np.random.randn(len(dates)))
df2 = pd.DataFrame(data, index=dates, columns=header2)
df = pd.concat([df, df2], axis=1)
header3 = pd.MultiIndex.from_product([['roof'], ['door'], ['basement']], names = ['top', 'middle', 'bottom'])
data = 2*(np.random.randn(len(dates)))
df3 = pd.DataFrame(data, index=dates, columns=header3)
df = pd.concat([df, df3], axis=1)

df = df.rename(columns={'winZZw':'window'}, level='middle')
print(df.head())

top             roof                    
middle        window                door
bottom      basement  basement  basement
2000-01-01 -0.131052 -1.189049  1.310137
2000-02-01 -0.200646  1.893930  2.124765
2000-03-01 -1.690123 -2.128965  1.639439
2000-04-01 -0.794418  0.605021 -2.810978
2000-05-01  1.528002 -0.286614  0.736445

answered Jul 11, 2018 at 8:42

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

JC_CL Over a year ago

I admit, I forgot to test it with the sample, since it failed with TypeError: rename() got an unexpected keyword argument "level" on my real data, which seemed to indicate to me that rename simply cant work on the index.

jezrael Over a year ago

@JC_CL - Waht is your pandas version? Need 0.20.0+ - check Addition of a level keyword to DataFrame/Series.rename to rename labels in the specified level of a MultiIndex

JC_CL Over a year ago

Damn, still on 0.18. I really should not have started working with something that is not stable…

Clement H. · Accepted Answer · 2024-03-12 15:37:48Z

0

A more general solution to replace a string within a multiindex is the following

df.columns = pd.MultiIndex.from_tuples([tuple([x.replace("to_replace", "new_str") for x in tuple_index]) for tuple_index in df.columns])

answered Mar 12, 2024 at 15:37

Clement H.

1,4482 gold badges9 silver badges9 bronze badges

Collectives™ on Stack Overflow

How to replace a string in a pandas multiindex?

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related