Drop columns whose name contains a specific string from pandas DataFrame

Question

I have a pandas dataframe with the following column names:

Result1, Test1, Result2, Test2, Result3, Test3, etc...

I want to drop all the columns whose name contains the word "Test". The numbers of such columns is not static but depends on a previous function.

How can I do that?

cs95 · Accepted Answer · 2020-08-24 15:54:35Z

335

Here is one way to do this:

df = df[df.columns.drop(list(df.filter(regex='Test')))]

edited Aug 24, 2020 at 15:54

cs95

406k106 gold badges744 silver badges797 bronze badges

answered May 30, 2017 at 22:20

Bindiya12

3,5312 gold badges10 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Axel Over a year ago

Or directly in place: df.drop(list(df.filter(regex = 'Test')), axis = 1, inplace = True)

Charles Over a year ago

This is a much more elegant solution than the accepted answer. I would break it down a bit more to show why, mainly extracting list(df.filter(regex='Test')) to better show what the line is doing. I would also opt for df.filter(regex='Test').columns over list conversion

Louis R Over a year ago

I really wonder what the comments saying this answer is "elegant" means. I myself find it quite obfuscated, when python code should first be readable. It also is twice as slower than the first answer. And it uses the regex keyword when the like keyword seems to be more adequate.

cs95 Over a year ago

This is not actually as good an answer as people claim. The problem with filter is that it returns a copy of ALL the data as columns that you want to drop. It is wasteful if you're only passing this result to drop (which again returns a copy)... a better solution would be str.startswith (I've added an answer with that here).

Srivatsan Over a year ago

for multiple conditions, this can be done df.drop(df.filter(regex='Test|Rest|Best').columns, axis=1, inplace=True)

|

cs95 · Accepted Answer · 2019-10-04 00:06:48Z

143

Cheaper, Faster, and Idiomatic: `str.contains`

In recent versions of pandas, you can use string methods on the index and columns. Here, str.startswith seems like a good fit.

To remove all columns starting with a given substring:

df.columns.str.startswith('Test')
# array([ True, False, False, False])

df.loc[:,~df.columns.str.startswith('Test')]

  toto test2 riri
0    x     x    x
1    x     x    x

For case-insensitive matching, you can use regex-based matching with str.contains with an SOL anchor:

df.columns.str.contains('^test', case=False)
# array([ True, False,  True, False])

df.loc[:,~df.columns.str.contains('^test', case=False)] 

  toto riri
0    x    x
1    x    x

if mixed-types is a possibility, specify na=False as well.

edited Oct 4, 2019 at 0:06

answered Jan 28, 2019 at 21:43

cs95

406k106 gold badges744 silver badges797 bronze badges

3 Comments

Hedge92 Over a year ago

Hi cs95, can you explain the syntax / thought behind the syntax a bit more? Why do we need to use the colon and comma? Thus why df.loc[:,df....] vs df.loc[df....]?

Hedge92 Over a year ago

Where the accepted answer do not work properly for columns ending on _drop in my test data, this solution does work. This should be the accepted answer.

Jake Fisher Over a year ago

If you want to combine this with the drop method, you can do: df.drop(columns = df.columns[df.columns.str.startswith('Test')], inplace = True)

Nic · Accepted Answer · 2013-09-28 20:55:09Z

122

import pandas as pd

import numpy as np

array=np.random.random((2,4))

df=pd.DataFrame(array, columns=('Test1', 'toto', 'test2', 'riri'))

print df

      Test1      toto     test2      riri
0  0.923249  0.572528  0.845464  0.144891
1  0.020438  0.332540  0.144455  0.741412

cols = [c for c in df.columns if c.lower()[:4] != 'test']

df=df[cols]

print df
       toto      riri
0  0.572528  0.144891
1  0.332540  0.741412

answered Sep 28, 2013 at 20:55

Nic

3,5173 gold badges23 silver badges31 bronze badges

1 Comment

Phillip Cloud Over a year ago

The OP didn't specify that the removal should be case insensitive.

Warren O'Neill · Accepted Answer · 2019-01-10 15:37:53Z

50

This can be done neatly in one line with:

df = df.drop(df.filter(regex='Test').columns, axis=1)

answered Jan 10, 2019 at 15:37

Warren O'Neill

7587 silver badges7 bronze badges

4 Comments

Max Ghenis Over a year ago

Similarly (and faster): df.drop(df.filter(regex='Test').columns, axis=1, inplace=True)

Srivatsan Over a year ago

for multiple conditions, this can be done df.drop(df.filter(regex='Test|Rest|Best').columns, axis=1, inplace=True)

veg2020 Over a year ago

Awesome adaptation of the above solution to filter for multiple conditions! Thank you for posting this :)

DarknessPlusPlus Over a year ago

@MaxGhenis I don't think doing anything with inplace = True can be considered fast these days, given that developers are considering removing this parameter at all.

SAH · Accepted Answer · 2016-07-13 05:18:46Z

22

You can filter out the columns you DO want using 'filter'

import pandas as pd
import numpy as np

data2 = [{'test2': 1, 'result1': 2}, {'test': 5, 'result34': 10, 'c': 20}]

df = pd.DataFrame(data2)

df

    c   result1     result34    test    test2
0   NaN     2.0     NaN     NaN     1.0
1   20.0    NaN     10.0    5.0     NaN

Now filter

df.filter(like='result',axis=1)

Get..

   result1  result34
0   2.0     NaN
1   NaN     10.0

answered Jul 13, 2016 at 5:18

SAH

2693 silver badges8 bronze badges

2 Comments

stallingOne Over a year ago

Best answer! Thanks. How do you filter opposite ? not like='result'

Amir Over a year ago

then do this: df=df.drop(df.filter(like='result',axis=1).columns,axis=1)

Janosh · Accepted Answer · 2021-09-25 17:41:06Z

13

Using a regex to match all columns not containing the unwanted word:

df = df.filter(regex='^((?!badword).)*$')

edited Sep 25, 2021 at 17:41

Janosh

4,9973 gold badges37 silver badges39 bronze badges

answered Apr 19, 2020 at 17:19

Roy Assis

1812 silver badges13 bronze badges

Comments

Phillip Cloud · Accepted Answer · 2013-09-28 21:07:25Z

11

Use the DataFrame.select method:

In [38]: df = DataFrame({'Test1': randn(10), 'Test2': randn(10), 'awesome': randn(10)})

In [39]: df.select(lambda x: not re.search('Test\d+', x), axis=1)
Out[39]:
   awesome
0    1.215
1    1.247
2    0.142
3    0.169
4    0.137
5   -0.971
6    0.736
7    0.214
8    0.111
9   -0.214

answered Sep 28, 2013 at 21:07

Phillip Cloud

25.8k12 gold badges72 silver badges91 bronze badges

4 Comments

7stud Over a year ago

And the op did not specify that a number had to follow 'Test': I want to drop all the columns whose name contains the word "Test".

Phillip Cloud Over a year ago

The assumption that a number follows Test is perfectly reasonable. Reread the question.

flutefreak7 Over a year ago

now seeing:

FutureWarning: 'select' is deprecated and will be removed in a future release. You can use .loc[labels.map(crit)] as a replacement

ijoseph Over a year ago

Remember to import re beforehand.

winderland · Accepted Answer · 2020-04-13 19:05:33Z

9

This method does everything in place. Many of the other answers create copies and are not as efficient:

df.drop(df.columns[df.columns.str.contains('Test')], axis=1, inplace=True)

answered Apr 13, 2020 at 19:05

winderland

5587 silver badges9 bronze badges

Comments

Marvasti · Accepted Answer · 2020-10-15 13:44:20Z

8

Question states 'I want to drop all the columns whose name contains the word "Test".'

test_columns = [col for col in df if 'Test' in col]
df.drop(columns=test_columns, inplace=True)

answered Oct 15, 2020 at 13:44

Marvasti

791 silver badge3 bronze badges

Comments

juil · Accepted Answer · 2023-01-23 19:45:47Z

4

You can use df.filter to get the list of columns that match your string and then use df.drop

resdf = df.drop(df.filter(like='Test',axis=1).columns.to_list(), axis=1)

edited Jan 23, 2023 at 19:45

juil

2,4883 gold badges18 silver badges20 bronze badges

answered May 9, 2020 at 13:35

ZacNt

492 bronze badges

2 Comments

Gino Mempin Over a year ago

This was already covered by this answer.

Makyen Over a year ago

While the answer linked in the above comment is similar, it is not the same. In fact, it's nearly the opposite.

jetpeach · Accepted Answer · 2023-09-11 18:36:45Z

I do not recommend using the 'filter' method, because it returns the entire dataframe and not good for larger datasets.

Instead, pandas provides regex filtering of columns using str.match:

df.columns.str.match('.*Test.*')
# array([ True, False, False, False])

(this will return boolean array for 'Test' anywhere in the column names, not just at the start)

Use .loc to designate the columns using the boolean array. Note that '~' inverts the boolean array, since we want to drop (not keep) all those columns that contain 'Test'

df = df.loc[:, ~df.columns.str.match('.*Test.*')]

In this way, only the columns names are needed for the filtering and we never need to return a copy of filtered data. Note there are other str methods that can be done on the column names, like startwith, endswith, but match provides the power of regex so most universal.

BSalita · Accepted Answer · 2020-06-13 20:40:24Z

1

Solution when dropping a list of column names containing regex. I prefer this approach because I'm frequently editing the drop list. Uses a negative filter regex for the drop list.

drop_column_names = ['A','B.+','C.*']
drop_columns_regex = '^(?!(?:'+'|'.join(drop_column_names)+')$)'
print('Dropping columns:',', '.join([c for c in df.columns if re.search(drop_columns_regex,c)]))
df = df.filter(regex=drop_columns_regex,axis=1)

edited Jun 13, 2020 at 20:40

answered Jun 5, 2020 at 17:11

BSalita

9,07111 gold badges59 silver badges75 bronze badges

Comments

tef2128 · Accepted Answer · 2023-07-13 21:40:44Z

1

Building on my preferred answer by @cs95, combining loc with a lambda function enables a nice clean pipe chain like this:

output_df = (
    input_df
    .stuff
    .more_stuff
    .yet_more_stuff
    .loc[:, lambda x: ~x.columns.str.startswith('Test')]
)

This way you can refer to columns of the dataframe produced by pd.DataFrame.yet_more_stuff, rather than the original dataframe input_df itself, as the columns may have changed (depending, of course, on all the stuff).

edited Jul 13, 2023 at 21:40

answered Jul 13, 2023 at 21:10

tef2128

7901 gold badge9 silver badges22 bronze badges

Collectives™ on Stack Overflow

Drop columns whose name contains a specific string from pandas DataFrame

13 Answers 13

7 Comments

Cheaper, Faster, and Idiomatic: `str.contains`

3 Comments

1 Comment

4 Comments

2 Comments

Comments

4 Comments

Comments

Comments

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

7 Comments

Cheaper, Faster, and Idiomatic: str.contains

3 Comments

1 Comment

4 Comments

2 Comments

Comments

4 Comments

Comments

Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Cheaper, Faster, and Idiomatic: `str.contains`