How to subset and list a DataFrame using for loop in Python?

Question

I have a DataFrame with 3 columns and 1,000+ rows,

df 
   day         product         order
2010-01-01    150ml Mask          9
2010-01-02    230ml Lotion       27
2010-01-03    600ml Shampoo      33

And I would like to subset each product as following,

 df_mask                 df_lotion            df_shampoo  
   day        order        day       order     day         order
2010-01-01      9       2010-01-02    27      2010-01-03    33   
2010-01-09      8       2010-01-05    30      2010-01-04    25
2010-01-11     13       2010-01-06    29      2010-01-06    46

This is how I do it,

# Create a product list 
productName = df['product'].tolist()

# Subsetting
def subtable(df,productName):
    return (df[(df['product'] == productName)])

# Subsetting
df_mask = subtable(df, '150ml Mask')
df_lotion = subtable(df, '230ml Lotion')
df_shampoo = subtable(df, '230ml Shampoo')

Is there any way I can get all the subsets one time using for loop since the data frame has many different products.

pansen · Accepted Answer · 2017-03-10 08:24:06Z

4

You can use groupby for this purpose which does exactly what you need:

# show example data
print(df)

     day           product             order
0    2010-01-01    "150ml Mask"          9
1    2010-01-02    "230ml Lotion"       27
2    2010-01-03    "600ml Shampoo"      33
3    2010-01-04    "250ml Mask"         12
4    2010-01-05    "330ml Lotion"       24
5    2010-01-06    "400ml Shampoo"      13

# split product column and keep only product name
df["product"] = df["product"].str.split(expand=True)[1]

# groupby product
products = df.groupby("product")

# print product and corresponding product df
for product, product_df in products:
    print(product)
    print(product_df)

Lotion
          day product  order
1  2010-01-02  Lotion     27
4  2010-01-05  Lotion     24

Mask
          day product  order
0  2010-01-01    Mask      9
3  2010-01-04    Mask     12

Shampoo
          day  product  order
2  2010-01-03  Shampoo     33
5  2010-01-06  Shampoo     13

In order to access each sub group individually, you can use get_group which corresponds to your subtable function:

mask_df = products.get_group("Mask")
print(mask_df)

    day         product     order
0   2010-01-01  Mask        9
3   2010-01-04  Mask        12

Finally, to get all sub data frames within one dictionary, you can loop over products and drop the product-column itself:

df_dict = {product: product_df.drop("product", axis=1) 
          for product, product_df in products}
print(df_dict["Mask"])

    day         order
0   2010-01-01  9
3   2010-01-04  12

edited Mar 10, 2017 at 8:24

answered Mar 10, 2017 at 7:47

pansen

6,7034 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Peggy Over a year ago

Thank you for your answer. I tried df["product"] = df["product"].str.split(expand=True)[1], but some product names are not organized since some product names look like 0.7OZ Mask UK 6 . Is there other way to fix the problem?

pansen Over a year ago

@peggy What are the possible variations of the product labels? Extracting the product name completely depends on your input data. However, for your given example in your comment, df["product"].str.split(expand=True)[1] should sucessfully extract Mask from 0.7OZ Mask UK 6. Or do you need Mask including the UK 6?

Peggy Over a year ago

Yes. I will need Mask UK 6 . But I decided to assign each product a particular number to make sorting easier. Other than that, the codes run pretty well. Thank you very much!

Pintu · Accepted Answer · 2017-03-10 07:29:36Z

0

See if it helps:

dfs = {}
for grp in df.groupby('product'):
    dfs[grp[0].split(' ')[1]] = grp[1] # split gives you the product name as key

for key in dfs.keys():
    print dfs[key]

answered Mar 10, 2017 at 7:29

Pintu

3181 silver badge6 bronze badges

Comments

jezrael · Accepted Answer · 2017-03-10 08:27:26Z

I think you can use dict for storage all DataFrames, which is created dict comprehension with groupby and split:

producs = df['product'].str.split().str[-1]
print (producs)
0       Mask
1     Lotion
2    Shampoo
Name: product, dtype: object

dfs = {i:df.reset_index(drop=True) for i, df in df.groupby(producs)}
print (dfs)
{'Shampoo':           day        product  order
0  2010-01-03  600ml Shampoo     33, 'Mask':           day     product  order
0  2010-01-01  150ml Mask      9, 'Lotion':           day       product  order
0  2010-01-02  230ml Lotion     27}

print (dfs['Shampoo'])
          day        product  order
0  2010-01-03  600ml Shampoo     33

If you need remove column product use subset [['day','order']] or drop:

dfs = {i:df.reset_index(drop=True)[['day','order']] for i, df in df.groupby(producs)}
#dfs = {i:df.reset_index(drop=True).drop('product', axis=1) for i, df in df.groupby(producs)}
print (dfs)
{'Shampoo':           day  order
0  2010-01-03     33, 'Mask':           day  order
0  2010-01-01      9, 'Lotion':           day  order
0  2010-01-02     27}

print (dfs['Shampoo'])
          day  order
0  2010-01-03     33

Collectives™ on Stack Overflow

How to subset and list a DataFrame using for loop in Python?

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related