1

I am new to Pandas library and need some help. I have two columns like this:

Test Result       Risk Rating
  Fail               Low                   
  Pass               Medium
  Skip               High
  Pass               Low                   
  Fail               Medium
  Pass               High
  Skip               Low                   
  Fail               Medium
  Fail               High

Now, I need to find the total count of Fail, Pass and Skip from "Test Result" column and I am able to do it. But, I also need to get the total number of "Fail" from Test Result column with "Low" from Risk Rating column. Similarly, total number of Fail with Medium and so on. My final result should look like:

Fail (Low Risk Rating) = 1
Fail (Medium Risk Rating) = 2
Fail (High Risk Rating) = 1
Pass (Low Risk Rating) = 1
Pass (Medium Risk Rating) = 1
Pass (High Risk Rating) = 1
Skip (Low Risk Rating) = 1
Skip (Medium Risk Rating) = 0
Skip (High Risk Rating) = 1

How can I do this? Any help would be appreciated.

1 Answer 1

3

I think you need groupby by both columns and aggregate size:

df = df.groupby(['Test Result', 'Risk Rating']).size().reset_index(name='counts')
print (df)
  Test Result Risk Rating  counts
0        Fail        High       1
1        Fail         Low       1
2        Fail      Medium       2
3        Pass        High       1
4        Pass         Low       1
5        Pass      Medium       1
6        Skip        High       1
7        Skip         Low       1

Maybe nicer is pivot table with unstack:

df = df.groupby(['Test Result', 'Risk Rating']).size().unstack(fill_value=0)
print (df)
Risk Rating  High  Low  Medium
Test Result                   
Fail            1    1       2
Pass            1    1       1
Skip            1    1       0

Or slowier solution with crosstab:

df = pd.crosstab(df['Test Result'], df['Risk Rating'])
print (df)
Risk Rating  High  Low  Medium
Test Result                   
Fail            1    1       2
Pass            1    1       1
Skip            1    1       0

If need missing values with 0 add stack:

df = df.groupby(['Test Result', 'Risk Rating'])
       .size()
       .unstack(fill_value=0)
       .stack()
       .reset_index(name='counts')
print (df)
  Test Result Risk Rating  counts
0        Fail        High       1
1        Fail         Low       1
2        Fail      Medium       2
3        Pass        High       1
4        Pass         Low       1
5        Pass      Medium       1
6        Skip        High       1
7        Skip         Low       1
8        Skip      Medium       0
Sign up to request clarification or add additional context in comments.

2 Comments

thanks..I am using df = df.groupby(['Test Result', 'Risk Rating']).size().unstack(fill_value=0) but not able to get the particular values from the result of df. For ex. I just need the 'FAIL' values with 'HIGH', 'LOW', 'MEDIUM' values.
I think you need boolean indexing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.