0

I have the following data frame, which I need to use the aggregate function for one specific column which I am listing in the value. I am using pivot table from pandas for that.

 Sample ID  Type    Score   Freq
AE01    AAA Non 0.65    1
AE01    BBB IND 0.57    14
AE03    SAS IND 0.56    14
AE03    SAP IND 0.689   15
AE03    TCS IND 0.56    16
AE05    BBB IND 0.85    17
AE05    CTC IND 0.45    18
AE05    CTC Non 0.15    19
AE05    CTC Non 0.14    20
AE05    CTC Non 0.4678  21

The following is the script I used for that,

table_pat_rel = pd.pivot_table(df,index=["ID",'Type'],values=['Sample'],
               aggfunc={'Sample':np.size})

Give some following output,

ID  Type    Sample
AAA Non 1
BBB IND 2
SAS IND 1
SAP IND 1
TCS IND 1
CTC IND 5

But I am aiming following output,

ID  Recurrence  Sample
AAA 1   AE01
BBB 2   AE01 
        AE05
SAS 1   AE03
SAP 1   AE03
TCS 1   AE03
CTC 4   AE05

I tried with groupby as following

 df.drop_duplicates(['Sample', 'ID']).groupby(['ID','Sample']).size().sort_values(ascending=True).head()

1 Answer 1

1

Data:

df = pd.DataFrame(
{'Score': [0.65, 0.57, 0.56, 0.689, 0.56, 0.85, 0.45, 0.15, 0.14, 0.4678], 
'ID': ['AAA', 'BBB', 'SAS', 'SAP', 'TCS', 'BBB', 'CTC', 'CTC', 'CTC', 'CTC'], 
'Sample': ['AE01', 'AE01', 'AE03', 'AE03', 'AE03', 'AE05', 'AE05', 'AE05', 'AE05', 'AE05'], 
'Freq': [1, 14, 14, 15, 16, 17, 18, 19, 20, 21], 
'Type': ['Non', 'IND', 'IND', 'IND', 'IND', 'IND', 'IND', 'IND', 'IND', 'IND']},
columns=['Sample','ID','Type','Score','Freq'])
print (df)
  Sample   ID Type   Score  Freq
0   AE01  AAA  Non  0.6500     1
1   AE01  BBB  IND  0.5700    14
2   AE03  SAS  IND  0.5600    14
3   AE03  SAP  IND  0.6890    15
4   AE03  TCS  IND  0.5600    16
5   AE05  BBB  IND  0.8500    17
6   AE05  CTC  IND  0.4500    18
7   AE05  CTC  IND  0.1500    19
8   AE05  CTC  IND  0.1400    20
9   AE05  CTC  IND  0.4678    21

orig = pd.pivot_table(df,index=["ID",'Type'],values=['Sample'],
               aggfunc={'Sample':np.size})

print (orig)
          Sample
ID  Type        
AAA Non        1
BBB IND        2
CTC IND        4
SAP IND        1
SAS IND        1
TCS IND        1

I think you need swap Sample and Type, instead values=['Sample'] use values=['Freq'], but it seems you can use some other columns not used for index, because use aggfunc=len (same as aggfunc='size')

table_pat_rel1 = pd.pivot_table(df,index=["ID",'Sample'],values=['Freq'],aggfunc=len) \
                  .reset_index(level=1) \
                  .rename(columns={'Freq':'Recurrence'}) \
                  .set_index('Recurrence', append=True)
print (table_pat_rel1)
               Sample
ID  Recurrence       
AAA 1            AE01
BBB 1            AE01
    1            AE05
CTC 4            AE05
SAP 1            AE03
SAS 1            AE03
TCS 1            AE03

Or use groupby with aggregating size:

table_pat_rel2 = df.groupby(['ID','Sample']) \
                   .size() \
                   .reset_index(level=1) \
                   .rename(columns={0:'Recurrence'}) \
                   .set_index('Recurrence', append=True)

print (table_pat_rel2)
               Sample
ID  Recurrence       
AAA 1            AE01
BBB 1            AE01
    1            AE05
CTC 4            AE05
SAP 1            AE03
SAS 1            AE03
TCS 1            AE03
Sign up to request clarification or add additional context in comments.

4 Comments

Hello thanks for the solution, but I have edited my output could you please check it now
Thank you. Why is BBB twice? Is it typo?
It seem one row witn BBB is missing.
I try edit answer by your sample Dataframe. Output is same, but in row with BBB is difference - double 1 vs 2. If it is problem, can you explain more? Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.