2

How can I use pandas to obtain a summary table from my data below:

ID  Condition   Confirmed
D0119   Bad Yes
D0119   Good    No
D0117   Bad Yes
D0110   Bad Undefined
D1011   Bad Yes
D1011   Good    Yes
D1001   Bad Yes
D1001   Bad Yes

Required output:

ID  Condition   Confirmed   %Bad
D0119   Bad,Good    Yes, No 50
D0117   Bad,Yes 100
D0110   Bad,Undefined   0
D1011   Bad,Good    Yes, Yes
D1001   Bad,Bad Yes, Yes    100

Can anyone help? Thanks

2 Answers 2

2

you can do it this way:

In [123]: (df.assign(Bad=df.Condition=='Bad')
     ...:    .groupby('ID')
     ...:    .agg({'Condition':pd.Series.tolist,
     ...:          'Confirmed':pd.Series.tolist,
     ...:          'Bad':'mean'})
     ...: )
     ...:
Out[123]:
       Bad    Condition    Confirmed
ID
D0110  1.0        [Bad]  [Undefined]
D0117  1.0        [Bad]        [Yes]
D0119  0.5  [Bad, Good]    [Yes, No]
D1001  1.0   [Bad, Bad]   [Yes, Yes]
D1011  0.5  [Bad, Good]   [Yes, Yes]

vertical variant:

In [113]: df
Out[113]:
      ID Condition  Confirmed
0  D0119       Bad        Yes
1  D0119      Good         No
2  D0117       Bad        Yes
3  D0110       Bad  Undefined
4  D1011       Bad        Yes
5  D1011      Good        Yes
6  D1001       Bad        Yes
7  D1001       Bad        Yes

In [114]: g = df.assign(Bad=df.Condition=='Bad').groupby('ID')

In [115]: df['Bad'] = df['ID'].map((g.sum().div(g.size(), 0)*100).Bad)

In [116]: df
Out[116]:
      ID Condition  Confirmed    Bad
0  D0119       Bad        Yes   50.0
1  D0119      Good         No   50.0
2  D0117       Bad        Yes  100.0
3  D0110       Bad  Undefined  100.0
4  D1011       Bad        Yes   50.0
5  D1011      Good        Yes   50.0
6  D1001       Bad        Yes  100.0
7  D1001       Bad        Yes  100.0
Sign up to request clarification or add additional context in comments.

Comments

2

Consider something along the following.

import pandas as pd

df = pd.DataFrame({'ID':['D0119', 'D0119', 'D0117', 'D0110', 'D1011', 'D1011', 'D1001', 'D1001'],
                   'Condition':['Bad', 'Good', 'Bad', 'Bad', 'Bad', 'Good', 'Bad', 'Bad'],
                   'Confirmed':['Yes', 'No', 'Yes', 'Undefined', 'Yes', 'Yes', 'Yes', 'Yes']})

df_grp = df.loc[df['Confirmed'] != 'Undefined'].groupby('ID')
summary = pd.DataFrame({'Condition':df_grp['Condition'],
                        'pnt_bad':df_grp['Condition'].apply(lambda x: sum(x=='Bad')/len(x))})

Note that this approach doesn't preserve the appearance of records which have only 'Undefined' status.

1 Comment

Thanks so much guys. Your solutions worked fine. Apologies for my lateness in replying.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.