1

I have a small dataframe, two columns wide. My goal is to split this dataframe into a list of dataframes, based on unique values from the QE column.

I can't seem to locate the error in my code.

Edited for clarity:

import pandas as pd

def Function1():
    data = {'Name': ['Dave', 'Sue', 'John', 'Dave', 'Michael', 'Sue'],
            'QE': ['12.31.2019', '12.31.2019', '12.31.2019', '03.31.2020', '03.31.2020', '03.31.2020']
            }
    df = pd.DataFrame(data, columns=['Name', 'QE'])
    
    Quarters = list(df['QE'].unique())
               
    dfs = []
    for x in Quarters:
        df = df[df['QE'] == x]
        df = df['Name'].reset_index(drop=True) 
        dfs.append(df)
    
    return df

a = Function1()
KeyError: 'QE' 
5
  • 2
    can you add your dataframe? please see How to Ask and minimal reproducible example Commented Aug 6, 2020 at 16:19
  • It seems that the column QE is not there in your file. Which python version are you using? Running your code on python 2.7 with a simple csv file it works. Commented Aug 6, 2020 at 16:21
  • Did you check if the column exists in the data frame? Try running df.columns and see if it exists. Commented Aug 6, 2020 at 16:31
  • @Manakin I have edited the code to create a reproducible example. Thanks for the suggestion. Commented Aug 6, 2020 at 16:45
  • Also: Working with dataframes in a dictionary instead of a list can come in handy, because you could assign keys instead of calling the dataframes via their position in the list Commented Aug 6, 2020 at 17:57

1 Answer 1

5

use a list comprehension and groupby

dfs = [dataframe for _, dataframe in df.groupby('QE')]


print(dfs)

[      Name          QE
 3     Dave  03.31.2020
 4  Michael  03.31.2020
 5      Sue  03.31.2020,    Name          QE
 0  Dave  12.31.2019
 1   Sue  12.31.2019
 2  John  12.31.2019]

print(dfs[1])

   Name          QE
0  Dave  12.31.2019
1   Sue  12.31.2019
2  John  12.31.2019

in a standard for loop this would be

dfs = []
for _, dataframe in df.groupby('QE'):
    dfs.append(dataframe)
Sign up to request clarification or add additional context in comments.

3 Comments

This works, thank you for your response. I understand that the "_," is necessary in the solution, as my desired result requires it. But I am not familiar with the code.
the _ is basically a filler its the index of the dataframe - in this instance the group we aren't using it so we ignore it. @DylanMoore see edit.
your second solution is actually preferable for me, as I have some other dataframe cleaning to do within the loop. Thanks for the help and for the explanation!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.