0

I have a pandas data frame, df, that has 4 columns and a lot of rows.

I want to create 5 different data frames based on the value of one of the columns of the data frame. The column I am referring to is called color.

color has 5 unique values: red, blue, green, yellow, orange.

What I want to do is each of the 5 new data frames should contain all rows which have on of the values in color. For instance df_blue should have all the rows and columns where in the other data frame the value from the color column is blue.

The code I have is the following:

# create 5 new data frames
df_red = []
df_blue= []
df_green= []
df_yellow= []
df_orange= []
for i in range(len(df)):
    if df['color'] == "blue"
       df_blue.append(df)

# i would do if-else statements to satisfy all 5 colors

I feel I am missing some logic...any suggestions or comments?

Thanks!

2 Answers 2

3

You need to use groupby. The following code fragment creates a sample DataFrame and converts it into a dictionary where colors are keys and the matching dataframes are values:

df = pd.DataFrame({'color': ['red','blue','red','green','blue'],
                   'foo': [1,2,3,4,5]})
colors = {color: dfc for color,dfc in df.groupby('color')}
#{'blue':   color  foo
#         1  blue    2
#         4  blue    5, 
# 'green':    color  foo
#          3  green    4, 
# 'red':   color  foo
#        0   red    1
#        2   red    3}
Sign up to request clarification or add additional context in comments.

6 Comments

Dictionary comprehension nice. +1
I get this error AttributeError: 'dict' object has no attribute 'groupby', also how can I do using data frames? I tried data.groupby('color') and it outputs <pandas.core.groupby.DataFrameGroupBy object at 0x000000003193DF28>. I want to have 5 data frames for each unique color.
You said df is a DataFrame. Why is it a dict?
oh, the first error that says dict is the output of putting this code colors = {color: dfc for color,dfc in df.groupby('color')}
the second error i mentioned was a result of this code data.groupby('color')
|
-1

I ended up doing this for each of the colors.

  blue_data = data[data.color =='blue']

3 Comments

This is not a good idea. You should use a dictionary for a variable number of variables.
It's inefficient, manual, hard to track, assumes you know your colours beforehand, prone to error, polluting the namespace, unstructured.
hmm i see...well my dataframe consists of pandas series, that is why i did it that way

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.