Drop specific multiIndex columns in a pandas dataframe

Question

Suppose one has a dataframe created as such:

tdata = {('A', 50): [1, 2, 3, 4],
         ('A', 55): [5, 6, 7, 8],
         ('B', 10): [10, 20, 30, 40],
         ('B', 20): [50, 60, 70, 80],
         ('B', 50): [2, 4, 6, 8],
         ('B', 55): [10, 12, 14, 16]}
tdf = pd.DataFrame(tdata, index=range(0,4))

      A      B
     50 55  10  20 50  55
   0  1  5  10  50  2  10
   1  2  6  20  60  4  12
   2  3  7  30  70  6  14
   3  4  8  40  80  8  16

How would one drop specific columns, say ('B', 10) and ('B', 20) from the dataframe?
Is there a way to drop the columns in one command such as tdf.drop(['B', [10,20]])? Note, I know that my example of the command is by no means close to what it should be, but I hope that it gets the gist across.
Is there a way to drop the columns through some logical expression? For example, say I want to drop columns having the sublevel indices less than 50 (again, the 10, 20 columns). Can I do some general command that would encompass column 'A', even though the 10,20 sublevel indices don't exist or must I specifically reference column 'B'?

Can you explain more Can I do some general command that would encompass column 'A', even though the 10,20 sublevel indices don't exist or must I specifically reference column 'B'? — jezrael
– jezrael, Commented Mar 16, 2017 at 14:35
@jezrael Thanks for asking. I was wondering if I could do something like wildcarding the top levels 'A' and 'B' and go after the sublevels that I don't want, something like tdf.drop([:, [10,20]]). — user1745564
– user1745564, Commented Mar 16, 2017 at 15:38
I think not, it is not possible. Only select by slicers, but not dropping. — jezrael
– jezrael, Commented Mar 16, 2017 at 15:41

jezrael · Accepted Answer · 2017-03-16 14:56:39Z

6

You can use drop by list of tuples:

print (tdf.drop([('B',10), ('B',20)], axis=1))
   A     B    
  50 55 50  55
0  1  5  2  10
1  2  6  4  12
2  3  7  6  14
3  4  8  8  16

For remove columns by level:

mask = tdf.columns.get_level_values(1) >= 50
print (mask)
[ True  True False False  True  True]

print (tdf.loc[:, mask])
   A     B    
  50 55 50  55
0  1  5  2  10
1  2  6  4  12
2  3  7  6  14
3  4  8  8  16

If need remove by level is possible specify only one level:

print (tdf.drop([50,55], axis=1, level=1))
    B    
   10  20
0  10  50
1  20  60
2  30  70
3  40  80

edited Mar 16, 2017 at 14:56

answered Mar 16, 2017 at 14:29

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Drop specific multiIndex columns in a pandas dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related