Create new dataframes in python pandas based on the value of a column

Question

I have a dataset that looks like that:

There are 15 unique values in the column 'query id', so I am trying to create new dataframes for each unique value. I thought of having a loop for every unique value in column 'query id' with a code like this:

df_list = []
i = 0

for x in df['query id'].unique():
    df{i} = pd.DataFrame(columns=df.columns) 
    df_list.append()
    i+=1

But I am definitely doing something wrong there and got stuck. Do you have any ideas of how to do that?

Sample dataset:

relevance   query id   1   2   3
        1   WT04-170  10  40  80
        1   WT04-170  20  60  70
        1   WT04-176  30  70  50     
        1   WT04-176  40  90  20      
        1   WT04-173  50 100  10

grey_ranger · Accepted Answer · 2022-11-15 18:33:24Z

2

Pandas has a built-in function for iterating unique values in a column and selecting the matching rows. The function is groupby

In your case, you can create the dictionary as a one-liner using:

dfs = {query_id: grp.copy() for query_id, grp in df.groupby("query id")}

Once you have your dictionary of dataframes, you can access each one using the query id as your key:

my_df = dfs["WT04-170"]  # Access each dataframe using the appropriate key
my_df.describe()  # Do your work with the dataframe here

edited Nov 15, 2022 at 18:33

answered Nov 15, 2022 at 18:10

grey_ranger

1,0321 gold badge10 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

mariant Over a year ago

It kind of worked, but how can I then access a particular dataframe? For example, if I want to work with a dataframe for query X? I thought of saving all of these newly generated dataftames in a list like: [df1,df2,df3 etc].. I also added a sample dataset to my question in case it can help!

grey_ranger Over a year ago

Thanks for adding a dataset, that always makes it easier for someone to help! You can access dataframes from the dictionary using the query id as the key, the answer is updated with an example.

BeRT2me Over a year ago

If you just want a list, you can use this instead:[grp.copy() for query_id, grp in df.groupby("query id")], but a dictionary is far better organized~

KJDII · Accepted Answer · 2022-11-15 17:03:47Z

0

Does something like this help?


df_list = []

for x in set(df['query id'].to_list()):
    df = df[df['query id'] == x].copy() 
    df_list.append(df)

answered Nov 15, 2022 at 17:03

KJDII

8616 silver badges11 bronze badges

Comments

markd227 · Accepted Answer · 2022-11-15 18:43:12Z

0

It sounds like what you want is a filtered dataframe for each unique query id. So you would end up with 15 dataframes, each containing only the rows for that specific query id from the combined df. Is that right?

In that case, your approach is close, but you could just filter the df in your loop. I also used a dict to store the resulting dataframes too, but you could do it with the list as well.

If my understanding of what you're looking for is correct, I think this should work for you:

df_dict = {}
for (i,x) in enumerate(df['query id'].unique()):
    df_dict[i] = df[df['query id']==x].copy()

You could also just use the query_ids as the dict keys too, like this:

df_dict = {}
for x in df['query id'].unique():
    df_dict[x] = df[df['query id']==x].copy()

edited Nov 15, 2022 at 18:43

answered Nov 15, 2022 at 17:10

markd227

3151 gold badge2 silver badges7 bronze badges

3 Comments

mariant Over a year ago

That makes sense; though gives me a syntax error pointing at df_dict{I}. I think it doesn't like the {i} thing..

mariant Over a year ago

Btw, that's exactly what I am trying to do - 'a filtered dataframe for each unique query id. So you would end up with 15 dataframes, each containing only the rows for that specific query id from the combined df'

markd227 Over a year ago

Oh sorry - typo - should be [ ] not { } - will edit. The {} define the dict, but to create a key:value pair use [ ]

Collectives™ on Stack Overflow

Create new dataframes in python pandas based on the value of a column

3 Answers 3

3 Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related