Transforming a Dataframe with duplicate data in python

Question

I would like to transform the below dataframe to concatenate duplicate data into a single row. For example:

data_dict={'FromTo_U': {0: 'L->R', 1: 'L->R', 2: 'S->I'},
     'GeneName': {0: 'EGFR', 1: 'EGFR', 2: 'EGFR'},
     'MutationAA_C': {0: 'p.L858R', 1: 'p.L858R', 2: 'p.S768I'},
     'MutationDescription': {0: 'Substitution - Missense',
      1: 'Substitution - Missense',
      2: 'Substitution - Missense'},
     'PubMed': {0: '22523351', 1: '23915069', 2: '26862733'},
     'VariantID': {0: 'COSM12979', 1: 'COSM12979', 2: 'COSM18486'},
     'VariantPos_U': {0: '858', 1: '858', 2: '768'},
     'VariantSource': {0: 'COSMIC', 1: 'COSMIC', 2: 'COSMIC'}}
df1=pd.DataFrame(data_dict)

transformed dataframe should be

data_dict_t={'FromTo_U': {0: 'L->R', 2: 'S->I'},
 'GeneName': {0: 'EGFR', 2: 'EGFR'},
 'MutationAA_C': {0: 'p.L858R', 2: 'p.S768I'},
 'MutationDescription': {0: 'Substitution - Missense',2: 'Substitution - Missense'},
 'PubMed': {0: '22523351,23915069', 2: '26862733'},
 'VariantID': {0: 'COSM12979', 2: 'COSM18486'},
 'VariantPos_U': {0: '858',  2: '768'},
 'VariantSource': {0: 'COSMIC', 2: 'COSMIC'}}

I want to merge the two rows of df1 only if PubMed IDs are different and rest of the columns have same data. Thanks in advance!

cs95 · Accepted Answer · 2018-01-28 05:09:10Z

2

Use groupby + agg with str.join as the aggfunc.

c = df1.columns.difference(['PubMed']).tolist()
df1.groupby(c, as_index=False).PubMed.agg(','.join)

  FromTo_U GeneName MutationAA_C      MutationDescription  VariantID  \
0     L->R     EGFR      p.L858R  Substitution - Missense  COSM12979   
1     S->I     EGFR      p.S768I  Substitution - Missense  COSM18486   

  VariantPos_U VariantSource             PubMed  
0          858        COSMIC  22523351,23915069  
1          768        COSMIC           26862733

answered Jan 28, 2018 at 5:09

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Transforming a Dataframe with duplicate data in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related