Select pandas dataframe columns based on which names contain strings in list

Question

I have a dataframe, df, and a list of strings, cols_needed, which indicate the columns I want to retain in df. The column names in df do not exactly match the strings in cols_needed, so I cannot directly use something like intersection. But the column names do contain the strings in cols_needed. I tried playing around with str.contains but couldn't get it to work. How can I subset df based on cols_needed?

import pandas as pd
df = pd.DataFrame({
    'sim-prod1': [1,2],
    'sim-prod2': [3,4],
    'sim-prod3': [5,6],
    'sim_prod4': [7,8]
})

cols_needed = ['prod1', 'prod2']

# What I want to obtain:
    sim-prod1  sim-prod2
0      1        3
1      2        4

ALollz · Accepted Answer · 2021-03-09 21:21:13Z

3

With the regex option of filter

df.filter(regex='|'.join(cols_needed))

   sim-prod1  sim-prod2
0          1          3
1          2          4

answered Mar 9, 2021 at 21:21

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Quang Hoang · Accepted Answer · 2021-03-09 21:18:33Z

3

You can explore str.contains with a joint pattern, for example:

df.loc[:,df.columns.str.contains('|'.join(cols_needed))]

Output:

   sim-prod1  sim-prod2
0          1          3
1          2          4

answered Mar 9, 2021 at 21:18

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Comments

sammywemmy · Accepted Answer · 2021-03-27 23:30:39Z

3

A list comprehension could work as well:

columns = [cols for cols in df 
           for col in cols_needed 
           if col in cols]

['sim-prod1', 'sim-prod2']

In [110]: df.loc[:, columns]
Out[110]: 
   sim-prod1  sim-prod2
0          1          3
1          2          4

answered Mar 27, 2021 at 23:30

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

1 Comment

tdy Over a year ago

Nice, or just df[columns] in this case

Collectives™ on Stack Overflow

Select pandas dataframe columns based on which names contain strings in list

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related