Using a loop function to filter a dataframe into a list of dataframes

Question

I have a small dataframe, two columns wide. My goal is to split this dataframe into a list of dataframes, based on unique values from the QE column.

I can't seem to locate the error in my code.

Edited for clarity:

import pandas as pd

def Function1():
    data = {'Name': ['Dave', 'Sue', 'John', 'Dave', 'Michael', 'Sue'],
            'QE': ['12.31.2019', '12.31.2019', '12.31.2019', '03.31.2020', '03.31.2020', '03.31.2020']
            }
    df = pd.DataFrame(data, columns=['Name', 'QE'])
    
    Quarters = list(df['QE'].unique())
               
    dfs = []
    for x in Quarters:
        df = df[df['QE'] == x]
        df = df['Name'].reset_index(drop=True) 
        dfs.append(df)
    
    return df

a = Function1()

KeyError: 'QE'

can you add your dataframe? please see How to Ask and minimal reproducible example — Umar.H
– Umar.H, Commented Aug 6, 2020 at 16:19
It seems that the column QE is not there in your file. Which python version are you using? Running your code on python 2.7 with a simple csv file it works. — Carlo Zanocco
– Carlo Zanocco, Commented Aug 6, 2020 at 16:21
Did you check if the column exists in the data frame? Try running df.columns and see if it exists. — Puneet Singh
– Puneet Singh, Commented Aug 6, 2020 at 16:31
@Manakin I have edited the code to create a reproducible example. Thanks for the suggestion. — Dylan Moore
– Dylan Moore, Commented Aug 6, 2020 at 16:45
Also: Working with dataframes in a dictionary instead of a list can come in handy, because you could assign keys instead of calling the dataframes via their position in the list — Sanoj
– Sanoj, Commented Aug 6, 2020 at 17:57

Umar.H · Accepted Answer · 2020-08-06 17:23:16Z

5

use a list comprehension and groupby

dfs = [dataframe for _, dataframe in df.groupby('QE')]


print(dfs)

[      Name          QE
 3     Dave  03.31.2020
 4  Michael  03.31.2020
 5      Sue  03.31.2020,    Name          QE
 0  Dave  12.31.2019
 1   Sue  12.31.2019
 2  John  12.31.2019]

print(dfs[1])

   Name          QE
0  Dave  12.31.2019
1   Sue  12.31.2019
2  John  12.31.2019

in a standard for loop this would be

dfs = []
for _, dataframe in df.groupby('QE'):
    dfs.append(dataframe)

edited Aug 6, 2020 at 17:23

answered Aug 6, 2020 at 16:48

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Dylan Moore Over a year ago

This works, thank you for your response. I understand that the "_," is necessary in the solution, as my desired result requires it. But I am not familiar with the code.

Umar.H Over a year ago

the _ is basically a filler its the index of the dataframe - in this instance the group we aren't using it so we ignore it. @DylanMoore see edit.

Dylan Moore Over a year ago

your second solution is actually preferable for me, as I have some other dataframe cleaning to do within the loop. Thanks for the help and for the explanation!

Collectives™ on Stack Overflow

Using a loop function to filter a dataframe into a list of dataframes

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related