0

I downloaded the kickstarter dataset from kaggle, And now I would like to see the most popular categories split up with the 3 different states ('successful', 'failed', 'cancelled') I was hoping to get an output like

main_category   state       name    goal    start_date
Film & Video    failed      29653   29652   29653   
Film & Video    successful  21404   21404   21404
Film & Video    canceled    5162    5162    5162
Music           successful  21763   21763   21763
Music           failed      19193   19193   19193
Publishing      failed      19920   19920   19920
Technology      failed      16347   16347   16347
Technology      successful  5062    5062    5062
Technology      canceled    3749    3749    3749
Fashion         successful  4310    4310    4310
Fashion         failed      11500   11500   11500

and I tried ks.groupby(['main_category','state']).count().sort_values('name', ascending=False) but that sorts to the raw numbers:

Film & Video    failed      29653   29652   29653
Music           successful  21763   21763   21763
Film & Video    successful  21404   21404   21404
Publishing      failed      19920   19920   19920
Music           failed      19193   19193   19193
Technology      failed      16347   16347   16347
Food            failed      13602   13602   13602

I'm not sure how how to sort on the total count and than subsort on the state. I tried sorting on multiple columns but the main sorting is on the absolute numbers.

2 Answers 2

1

Here's a solution:

ks.groupby(['main_category','state']).count()[["name"]].reset_index().sort_values(["main_category","name"], ascending=False)

The idea is, after groupby and count, you will need to reset_index and then sort_values.

Here's my output:

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

This does not sort for the most popular main_categories.
0

I believe I solved it. Not sure if it's the most elegant solution but it gives the most popular main_categories. First I needed to broadcast the totals_per_main_category in a separate, new column:

ks['total']=ks.groupby('main_category').transform('count')['ID']

then I need to groupby('total', 'main_category', 'state') followed by the reset_index mentioned in Yilun's answer.

ks.groupby(['total','main_category','state']).count().reset_index().sort_values(['total','ID'], ascending=False) 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.