1

I have a pandas DataFrame as shown below. I want to identify the index values of the columns in df that match a given string (more specifically, a string that matches the column names after 'sim-' or 'act-').

# Sample df
import pandas as pd
df = pd.DataFrame({
    'sim-prod1': [1, 1.4],
    'sim-prod2': [2, 2.1],
    'act-prod1': [1.1, 1],
    'act-prod2': [2.5, 2]
})

# Get unique prod values from df.columns
prods = pd.Series(df.columns[1:]).str[4:].unique()
prods
  array(['prod2', 'prod1'], dtype=object)

I now want to loop through prods and identify the columns where prod1 and prod2 occur, and then use those columns to create new dataframes. How can I do this? In R I could use the which function to do this easily. Example dataframes I want to obtain are below.

df_prod1
    sim_prod1   act_prod1
0   1.0         1.1
1   1.4         1.0

df_prod2
    sim_prod2   act_prod2
0   2.0         2.5
1   2.1         2.0

3 Answers 3

2

Try groupby with axis=1:

for prod, d in df.groupby(df.columns.str[-4:], axis=1):
    print(f'this is {prod}')
    print(d)
    print('='*20)

Output:

this is rod1
   sim-prod1  act-prod1
0        1.0        1.1
1        1.4        1.0
====================
this is rod2
   sim-prod2  act-prod2
0        2.0        2.5
1        2.1        2.0
====================

Now, to have them as variables:

dfs = {prod:d for prod, d in df.groupby(df.columns.str[-4:], axis=1)}
Sign up to request clarification or add additional context in comments.

Comments

2

Try this, storing the parts of the dataframe as a dictionary:

df_dict = dict(tuple(df.groupby(df.columns.str[4:], axis=1)))

print(df_dict['prod1'])
print('\n')
print(df_dict['prod2'])

Output:

   sim-prod1  act-prod1
0        1.0        1.1
1        1.4        1.0


   sim-prod2  act-prod2
0        2.0        2.5
1        2.1        2.0

Comments

1

You can also do this without using groupby() and for loop by:-

df_prod2=df[df.columns[df.columns.str.contains(prods[0])]]
df_prod1=df[df.columns[df.columns.str.contains(prods[1])]]

1 Comment

Only problem here is that in reality I have many prods, so writing them out one by one won't work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.