I'd like to drop rows from a pandas dataframe using the MultiIndex value.
I've tried quite a few things but I put below what I think was closer. (Actually I will explain the full problem since there might be an alternative solutions using a completely different approach). From a correlation matrix, I'd like to get the pair of columns that correlate more. I use unstack and put the results in a dataframe:
In [263]: corr_df = pd.DataFrame(total.corr().unstack())
Then get the higher correlations (actually I should get the negatives as well).
In [264]: high = corr_df[(corr_df[0] > 0.5) & (corr_df[0] < 1.0)]
In [236]: print high
0
residual sugar density 0.552517
free sulfur dioxide total sulfur dioxide 0.720934
total sulfur dioxide free sulfur dioxide 0.720934
wine 0.700357
density residual sugar 0.552517
wine total sulfur dioxide 0.700357
Closed enough, but there are duplicates, that's actually the point of the correlation matrix. In order to clean them up, my idea is to iterate the high values to remove duplicates:
In [267]:
for row in high.iterrows():
print row[0][0], ",", row[0][1]
print high.loc[row[0][1]].loc[row[0][0]].index
high.drop(high.loc[row[0][1]].loc[row[0][0]].index)
residual sugar , density
Int64Index([0], dtype='int64')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-267-1258da2a4772> in <module>()
2 print row[0][0], ",", row[0][1]
3 print high.loc[row[0][1]].loc[row[0][0]].index
----> 4 high.drop(high.loc[row[0][1]].loc[row[0][0]].index)
...
[huge stack of errors]
...
KeyError: 0
The method drop is working perfectly when the index is normal (see drop), but, how do I build the label when I got a MultiIndex?