0

I have a dataset that looks like that:

enter image description here

There are 15 unique values in the column 'query id', so I am trying to create new dataframes for each unique value. I thought of having a loop for every unique value in column 'query id' with a code like this:

df_list = []
i = 0

for x in df['query id'].unique():
    df{i} = pd.DataFrame(columns=df.columns) 
    df_list.append()
    i+=1

But I am definitely doing something wrong there and got stuck. Do you have any ideas of how to do that?

Sample dataset:

relevance   query id   1   2   3
        1   WT04-170  10  40  80
        1   WT04-170  20  60  70
        1   WT04-176  30  70  50     
        1   WT04-176  40  90  20      
        1   WT04-173  50 100  10

3 Answers 3

2

Pandas has a built-in function for iterating unique values in a column and selecting the matching rows. The function is groupby

In your case, you can create the dictionary as a one-liner using:

dfs = {query_id: grp.copy() for query_id, grp in df.groupby("query id")}

Once you have your dictionary of dataframes, you can access each one using the query id as your key:

my_df = dfs["WT04-170"]  # Access each dataframe using the appropriate key
my_df.describe()  # Do your work with the dataframe here
Sign up to request clarification or add additional context in comments.

3 Comments

It kind of worked, but how can I then access a particular dataframe? For example, if I want to work with a dataframe for query X? I thought of saving all of these newly generated dataftames in a list like: [df1,df2,df3 etc].. I also added a sample dataset to my question in case it can help!
Thanks for adding a dataset, that always makes it easier for someone to help! You can access dataframes from the dictionary using the query id as the key, the answer is updated with an example.
If you just want a list, you can use this instead:[grp.copy() for query_id, grp in df.groupby("query id")], but a dictionary is far better organized~
0

Does something like this help?


df_list = []

for x in set(df['query id'].to_list()):
    df = df[df['query id'] == x].copy() 
    df_list.append(df)



Comments

0

It sounds like what you want is a filtered dataframe for each unique query id. So you would end up with 15 dataframes, each containing only the rows for that specific query id from the combined df. Is that right?

In that case, your approach is close, but you could just filter the df in your loop. I also used a dict to store the resulting dataframes too, but you could do it with the list as well.

If my understanding of what you're looking for is correct, I think this should work for you:

df_dict = {}
for (i,x) in enumerate(df['query id'].unique()):
    df_dict[i] = df[df['query id']==x].copy()

You could also just use the query_ids as the dict keys too, like this:

df_dict = {}
for x in df['query id'].unique():
    df_dict[x] = df[df['query id']==x].copy()

3 Comments

That makes sense; though gives me a syntax error pointing at df_dict{I}. I think it doesn't like the {i} thing..
Btw, that's exactly what I am trying to do - 'a filtered dataframe for each unique query id. So you would end up with 15 dataframes, each containing only the rows for that specific query id from the combined df'
Oh sorry - typo - should be [ ] not { } - will edit. The {} define the dict, but to create a key:value pair use [ ]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.