I have a dataframe with a large multiindex, sourced from a vast number of csv files. Some of those files have errors in the various labels, ie. "window" is missspelled as "winZZw", which then causes problems when I select all windows with df.xs('window', level='middle', axis=1).
So I need a way to simply replace winZZw with window.
Here's a very minimal sample df: (lets assume the data and the 'roof', 'window'… strings come from some convoluted text reader)
header = pd.MultiIndex.from_product(['roof', 'window', 'basement'], names = ['top', 'middle', 'bottom'])
dates = pd.date_range('01/01/2000','01/12/2010', freq='MS')
data = np.random.randn(len(dates))
df = pd.DataFrame(data, index=dates, columns=header)
header2 = pd.MultiIndex.from_product(['roof', 'winZZw', 'basement'], names = ['top', 'middle', 'bottom'])
data = 3*(np.random.randn(len(dates)))
df2 = pd.DataFrame(data, index=dates, columns=header2)
df = pd.concat([df, df2], axis=1)
header3 = pd.MultiIndex.from_product(['roof', 'door', 'basement'], names = ['top', 'middle', 'bottom'])
data = 2*(np.random.randn(len(dates)))
df3 = pd.DataFrame(data, index=dates, columns=header3)
df = pd.concat([df, df3], axis=1)
Now I want to xs a new dataframe for all the houses that have a window at their middle level: windf = df.xs('window', level='middle', axis=1)
But this obviously misses the misspelled winZZw.
So, how I replace winZZw with window?
The only way I found was to use set_levels, but if I understood that correctly, I need to feed it the whole level, ie
df.columns.set_levels([u'window',u'window', u'door'], level='middle',inplace=True)
but this has two issues:
- I need to pass it the whole index, which is easy in this sample, but impossible/stupid for a thousand column df with hundreds of labels.
- It seems to need the list backwards (now, my first entry in the df has door in the middle, instead of the window it had). That can probably be fixed, but it seems weird
I can work around these issues by xsing a new df of only winZZws, and then setting the levels with set_levels(df.shape[1]*[u'window'], level='middle') and then concatting it together again, but I'd like to have something more straightforward analog to str.replace('winZZw', 'window'), but I can't figure out how.
MultiIndex.from_productneeds list of list as input.['roof', 'window', 'basement']to[['roof'],[ 'window'], ['basement']]to make it work. So perhaps you are using a pandas that is too old.