using a for loop variable as input for a dataframe name

Question

I have a dataframe constructed as follows:

df = pd.DataFrame({"taxon":["taxa1","taxa2","taxa3","taxa4","taxa5"],"rank":["genus","genus","family","species","species"]})

There are 3 different ranks in this example dataframe: genus, family and species. I want to extract the rows of df to create new dataframes for each of the ranks with the corresponding rows of that rank. The name of the new dataframe should be df_ followed by the name of the rank

So as output I want 3 dataframes df_genus, df_family, and df_species. Each of these contains the rows of that rank with the corresponding rows of the original df data frame.

I already tried several things, including:

ranks = ["genus","family","species"] 
for rank in ranks:
    "df_"+str(rank) = df.loc[df["rank"]==rank]

but this returns error: SyntaxError: can't assign to operator

How can I perform this operation?

Use a dict, with keys df_genus, df_family and df_species — Chris Adams
– Chris Adams, Commented Mar 12, 2020 at 11:26
Use d = {f'df_{k}': v for k, v in df.groupby('rank')} and access like d['df_genus'] etc. It's not a good idea to try and create variable names dynamically. see this post for more info — Chris Adams
– Chris Adams, Commented Mar 12, 2020 at 11:33

Arun AK · Accepted Answer · 2020-03-12 11:36:12Z

1

You can use globals() in order to create a dataframe inside a loop.

ranks = ["genus","family","species"] 
for rank in ranks:
    globals()["df_"+str(rank)] = df.loc[df["rank"]==rank]

Hope it helps :)

answered Mar 12, 2020 at 11:36

Arun AK

4,3902 gold badges26 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

using a for loop variable as input for a dataframe name

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related