Aggregating data using pandas python

Question

I have the following data similar to the below:

Table 1

Colour  Make
Red     Ford
Blue    BMW
Blue    BMW
Green   Golf
Yellow  Audi
Yellow  Audi
Yellow  Audi

Table 2

Colour  Make    Count
Green   Ford    5
Blue    BMW     1
Green   Golf    6
Orange  BMW     1

I would like to use pandas to aggregate the data in table 1, then either increment the count in table 2 if it already exists, or insert a new record if it does not exist. From the example data above:

Resultant table:

Colour  Make    Count
Green   Ford    5
Blue    BMW     3
Green   Golf    7
Orange  BMW     1
Red     Ford    1
Yellow  Audi    3

To complete the first aggregation step, I have:

df1.groupby(["Colour", "Make"]).size()reset_index(name="Count")

However, I'm not sure how to approach the second step. I'm inclined to opt for some kind of loop-based solution, but I've read that this is a no-no.

What would be the most appropriate way to get to the resultant table?

Thank you in advance.

BENY · Accepted Answer · 2018-12-19 19:42:18Z

2

Using concat with groupby size

pd.concat([df1.assign(Count=1),df2]).groupby(['Colour','Make']).Count.sum().reset_index()
Out[127]: 
   Colour  Make  Count
0    Blue   BMW      3
1   Green  Ford      5
2   Green  Golf      7#check you expected output at this line 
3  Orange   BMW      1
4     Red  Ford      1
5  Yellow  Audi      3

answered Dec 19, 2018 at 19:42

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

sytup Over a year ago

I get an error when I use .Count. I've replaced with ["Count"] and it works as expected.

jpp · Accepted Answer · 2018-12-19 22:16:27Z

1

You can align indices and structure, then use pd.DataFrame.add with fill_value=0.

res = df1.groupby(['Colour', 'Make']).size().to_frame('Count')\
         .add(df2.set_index(['Colour', 'Make']), fill_value=0)\
         .astype(int).reset_index()

print(res)

   Colour  Make  Count
0    Blue   BMW      3
1   Green  Ford      5
2   Green  Golf      7
3  Orange   BMW      1
4     Red  Ford      1
5  Yellow  Audi      3

answered Dec 19, 2018 at 22:16

jpp

166k37 gold badges301 silver badges362 bronze badges

Collectives™ on Stack Overflow

Aggregating data using pandas python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related