0

I have a dataset with columns "Company" and "Outcome".

Company is a company name, Outcome is either success or failure

I have created the following graph with this code. How can I sort this so that the top has the most "Outcomes" (i.e. combined Success and Failure) and it goes down in a descending manner?

The code I used is:

df = pd.DataFrame({'Company': ['C', 'A', 'A', 'B', 'B', 'B', 'B'], 'Outcome': ['Success', 'Success', 'Failure', 'Failure', 'Success', 'Failure','Success' ]})

df.groupby(['Company', 'Outcome']).size().unstack().plot(kind = 'barh', stacked=True)

plt.show()

Additionally, including sort_values() after size() does not appear to have any effect, so clearly I am using it wrong. Any advice?

Bar Graph

2
  • Please add a dummy dataframe. Also, when you use the sort_values, what are you sorting on? 'Company'? If so it would do it alphabetically. Commented Apr 4, 2022 at 2:35
  • 1
    I'm new here - I've just updated the code to a simple example. The dummy dataframe isn't showing up despite being in the code - df = pd.DataFrame({'Company': ['C', 'A', 'A', 'B', 'B', 'B', 'B'], 'Outcome': ['Success', 'Success', 'Failure', 'Failure', 'Success', 'Failure','Success' ]}) Commented Apr 4, 2022 at 2:57

1 Answer 1

1

As noted in the comments, sorting at the point of data will sort from the bottom to the largest, so the y-axis should be in reverse order.

Added: Need to sort by total value

df2 = pd.DataFrame({'Company': ['C', 'A', 'A', 'B', 'B', 'B', 'B', 'D', 'D', 'D', 'D', 'D', 'D'],
                    'Outcome': ['Success', 'Success', 'Failure', 'Failure', 'Success', 'Failure','Success','Success','Success','Success','Success','Success','Success' ]})

df3 = df2.groupby(['Company', 'Outcome']).size().unstack()
df3['Total'] = df3['Failure'].fillna(0)+ df3['Success']
df3.sort_values('Total', ascending=False, inplace=True)
ax = df3[['Failure','Success']].plot(kind='barh', stacked=True)
ax.invert_yaxis()

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

This is part of the way there. I've applied to a slightly bigger dataset and it appears to be sorting by the number of Successes first, and then subsequently by the number of Failures. I was hoping to get it to sort by total number of successes + failures. For example, it doesn't show this neatly arranged in terms of total - df = pd.DataFrame({'Company': ['C', 'A', 'A', 'B', 'B', 'B', 'B', 'D', 'D', 'D', 'D', 'D', 'D'], 'Outcome': ['Success', 'Success', 'Failure', 'Failure', 'Success', 'Failure','Success','Success','Success','Success','Success','Success','Success' ]})
If you need to sort by total, add a new total column and then sort by total column.
How do I assign the appropriate values to the Total column?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.