Groupby using loops in python

Question

I have ten dataframes with same column names 'Name' and 'data' respectively.

Using groupby and aggregation for all the dataset I am able to get the desired output but it is a lot of effort for ten dataset's and the margin of error increases because I need to maintain these dataset's separate. examples and codes provided below.

Df1:

Name data
Foo  Product
Foo  Misc
Bar  Product
Bar  Item

Df2:

Name data
Foo  Misc
Foo  Product
Bar  Product
Bar  Item

Desired output:
Df1:

Name data
Foo  Product,Misc
Bar  Product,Item

Df2:

Name data
Foo  Misc, Product
Bar  Product,Item

Currently I am using the below code to achieve this task

Group1= Df1.groupby('Name')['data'].agg(['data',','.join)]).reset_index()

Group2 = Df2.groupby('Name')['data'].agg(['data',','.join)]).reset_index()

Have tried the below but did not work

Group = [Df1,Df2]

for df in Group:
     df.groupby('Name')['data'].agg(['data',','.join)]).reset_index()

Also based on some suggestions tried the below
Group = [Df1,Df2]

for df in Group:
   df =  df.groupby('Name')['data'].agg(['data',','.join)]).reset_index()

Both did not produce any result no error on code but it's giving me the file without any changes.

Do you want the new data store in memory somewhere or print them out? — Quang Hoang
– Quang Hoang, Commented Feb 20, 2020 at 19:02
then shouldn't you do assignment, e.g. for df in Group: df = df.groupby['Name']...? — Quang Hoang
– Quang Hoang, Commented Feb 20, 2020 at 19:06

Scott Boston · Accepted Answer · 2020-02-20 20:35:58Z

2

My suggestion is to use a dictionary.

dd = {'Df1':Df1,
      'Df2':Df2}


for k, v in dd.items():
    dd[k] = v.groupby('Name').agg(list)

dd

Output:

{'Df1':                  Data
 Name                 
 Bar   [Product, Item]
 Foo   [Product, Misc], 
 'Df2':                  Data
 Name                 
 Bar   [Product, Item]
 Foo   [Misc, Product]}

answered Feb 20, 2020 at 20:35

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ragethewolf Over a year ago

Thanks for the suggestion Scott but I need to extract the data into Excel.

Scott Boston Over a year ago

@ragethewolf date? You can still access the dataframes using dd['Df1'] and dd['Df2'].

ragethewolf Over a year ago

Oh din know that will try this out too. Thanks Scott

ragethewolf Over a year ago

Thanks.for this Scott I used this for my programme

Pygirl · Accepted Answer · 2020-02-20 20:41:45Z

1

Df1 = pd.DataFrame({'Name':['Foo','Foo','Bar','Bar'],
                   'Data':['Product','Misc', 'Product', 'Item'],
                   })

Df2 = pd.DataFrame({'Name':['Foo','Foo','Bar','Bar'],
                   'Data':['Misc', 'Product', 'Product', 'Item'],
                   })

Solution

fields=[f'Df{i}' for i in range(1,3)]
dfsout=[Df1, Df2]
variables = locals()
for d,name in zip(dfsout,fields):
    variables["{0}".format(name)]=pd.DataFrame(d.groupby('Name')['Data'].apply(list)).reset_index(level=0)

Df1:

    Name    Data
0   Foo Product
1   Foo Misc
2   Bar Product
3   Bar Item

Df2:

Name    Data
0   Foo Misc
1   Foo Product
2   Bar Product
3   Bar Item

After Implementing the solution part:

Df1:

     Data
Name    
Bar [Product, Item]
Foo [Product, Misc]

Df2:

     Data
Name    
Bar [Product, Item]
Foo [Misc, Product]

edited Feb 20, 2020 at 20:41

answered Feb 20, 2020 at 20:24

Pygirl

13.4k6 gold badges36 silver badges48 bronze badges

3 Comments

ragethewolf Over a year ago

This is perfect just one help required when I am exporting data to xlsx my Name column is disappearing

Pygirl Over a year ago

Add .reset_index(level=0) to ---> variables["{0}".format(name)]=pd.DataFrame(d.groupby('Name')['Data'].apply(list))

Pygirl Over a year ago

I have Edited my answer. actually your column name became index. That's why you were not getting name while importing I changed Name back from index to a column by using reset_index

trigonom · Accepted Answer · 2020-02-20 20:51:38Z

0

a = [df1,df2]
for df in a:
    tmp = df.groupby(['Name'])['data'].apply(','.join).reset_index()
    df = df.append(tmp,ignore_index = True)

this will not change df1 and df2, but a[0] and a[1] will be updated, so if you don't mind accessing threw the list, you have there the updated tables

answered Feb 20, 2020 at 20:51

trigonom

5284 silver badges9 bronze badges

Collectives™ on Stack Overflow

Groupby using loops in python

3 Answers 3

4 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related