Identify where pandas dataframe columns match string

Question

I have a pandas DataFrame as shown below. I want to identify the index values of the columns in df that match a given string (more specifically, a string that matches the column names after 'sim-' or 'act-').

# Sample df
import pandas as pd
df = pd.DataFrame({
    'sim-prod1': [1, 1.4],
    'sim-prod2': [2, 2.1],
    'act-prod1': [1.1, 1],
    'act-prod2': [2.5, 2]
})

# Get unique prod values from df.columns
prods = pd.Series(df.columns[1:]).str[4:].unique()
prods
  array(['prod2', 'prod1'], dtype=object)

I now want to loop through prods and identify the columns where prod1 and prod2 occur, and then use those columns to create new dataframes. How can I do this? In R I could use the which function to do this easily. Example dataframes I want to obtain are below.

df_prod1
    sim_prod1   act_prod1
0   1.0         1.1
1   1.4         1.0

df_prod2
    sim_prod2   act_prod2
0   2.0         2.5
1   2.1         2.0

Quang Hoang · Accepted Answer · 2021-03-04 16:03:34Z

2

Try groupby with axis=1:

for prod, d in df.groupby(df.columns.str[-4:], axis=1):
    print(f'this is {prod}')
    print(d)
    print('='*20)

Output:

this is rod1
   sim-prod1  act-prod1
0        1.0        1.1
1        1.4        1.0
====================
this is rod2
   sim-prod2  act-prod2
0        2.0        2.5
1        2.1        2.0
====================

Now, to have them as variables:

dfs = {prod:d for prod, d in df.groupby(df.columns.str[-4:], axis=1)}

answered Mar 4, 2021 at 16:03

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Scott Boston · Accepted Answer · 2021-03-04 16:05:15Z

2

Try this, storing the parts of the dataframe as a dictionary:

df_dict = dict(tuple(df.groupby(df.columns.str[4:], axis=1)))

print(df_dict['prod1'])
print('\n')
print(df_dict['prod2'])

Output:

   sim-prod1  act-prod1
0        1.0        1.1
1        1.4        1.0


   sim-prod2  act-prod2
0        2.0        2.5
1        2.1        2.0

answered Mar 4, 2021 at 16:05

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Comments

Anurag Dabas · Accepted Answer · 2021-03-04 16:08:04Z

1

You can also do this without using groupby() and for loop by:-

df_prod2=df[df.columns[df.columns.str.contains(prods[0])]]
df_prod1=df[df.columns[df.columns.str.contains(prods[1])]]

answered Mar 4, 2021 at 16:08

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

1 Comment

Gaurav Bansal Over a year ago

Only problem here is that in reality I have many prods, so writing them out one by one won't work.

Collectives™ on Stack Overflow

Identify where pandas dataframe columns match string

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related