Data Analysis using Python Pandas

Question

I am new to Pandas library and need some help. I have two columns like this:

Test Result       Risk Rating
  Fail               Low                   
  Pass               Medium
  Skip               High
  Pass               Low                   
  Fail               Medium
  Pass               High
  Skip               Low                   
  Fail               Medium
  Fail               High

Now, I need to find the total count of Fail, Pass and Skip from "Test Result" column and I am able to do it. But, I also need to get the total number of "Fail" from Test Result column with "Low" from Risk Rating column. Similarly, total number of Fail with Medium and so on. My final result should look like:

Fail (Low Risk Rating) = 1
Fail (Medium Risk Rating) = 2
Fail (High Risk Rating) = 1
Pass (Low Risk Rating) = 1
Pass (Medium Risk Rating) = 1
Pass (High Risk Rating) = 1
Skip (Low Risk Rating) = 1
Skip (Medium Risk Rating) = 0
Skip (High Risk Rating) = 1

How can I do this? Any help would be appreciated.

jezrael · Accepted Answer · 2016-10-28 11:28:14Z

3

I think you need groupby by both columns and aggregate size:

df = df.groupby(['Test Result', 'Risk Rating']).size().reset_index(name='counts')
print (df)
  Test Result Risk Rating  counts
0        Fail        High       1
1        Fail         Low       1
2        Fail      Medium       2
3        Pass        High       1
4        Pass         Low       1
5        Pass      Medium       1
6        Skip        High       1
7        Skip         Low       1

Maybe nicer is pivot table with unstack:

df = df.groupby(['Test Result', 'Risk Rating']).size().unstack(fill_value=0)
print (df)
Risk Rating  High  Low  Medium
Test Result                   
Fail            1    1       2
Pass            1    1       1
Skip            1    1       0

Or slowier solution with crosstab:

df = pd.crosstab(df['Test Result'], df['Risk Rating'])
print (df)
Risk Rating  High  Low  Medium
Test Result                   
Fail            1    1       2
Pass            1    1       1
Skip            1    1       0

If need missing values with 0 add stack:

df = df.groupby(['Test Result', 'Risk Rating'])
       .size()
       .unstack(fill_value=0)
       .stack()
       .reset_index(name='counts')
print (df)
  Test Result Risk Rating  counts
0        Fail        High       1
1        Fail         Low       1
2        Fail      Medium       2
3        Pass        High       1
4        Pass         Low       1
5        Pass      Medium       1
6        Skip        High       1
7        Skip         Low       1
8        Skip      Medium       0

edited Oct 28, 2016 at 11:28

answered Oct 28, 2016 at 11:21

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Raman Balyan Over a year ago

thanks..I am using df = df.groupby(['Test Result', 'Risk Rating']).size().unstack(fill_value=0) but not able to get the particular values from the result of df. For ex. I just need the 'FAIL' values with 'HIGH', 'LOW', 'MEDIUM' values.

jezrael Over a year ago

I think you need boolean indexing

Collectives™ on Stack Overflow

Data Analysis using Python Pandas

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related