Combine value using python/pandas

Question

I am a python/pandas user and have a question about it. I have an Excel file as below.

   C1  C2  C3  C4     C5     C6  ID  Value
0  aa  ee  ii  mm  aaaaa   bbbb   1    100
1  bb  ff  jj  nn   cccc  ddddd   2     50
2  aa  ee  ii  mm   eeee   ffff   3     20
3  dd  hh  ll  pp   gggg   hhhh   4     10
4  aa  ee  ii  mm   abcd   efgh   5      5
5  bb  ff  jj  nn  aaaaa   bbbb   6      2

Code to reproduce—

df = pd.DataFrame({'Value': [100,50,20,10,5,2],
'ID': [1,2,3,4,5,6],
'C1': ['aa','bb','aa','dd','aa','bb'],
'C2': ['ee','ff','ee','hh','ee','ff'],
'C3': ['ii','jj','ii','ll','ii','jj'],
'C4': ['mm','nn','mm','pp','mm','nn'],
'C5': ['aaaaa','cccc','eeee','gggg','abcd','aaaaa'],
'C6': ['bbbb','ddddd','ffff','hhhh','efgh','bbbb']})

Some rows are duplicates in column1-4 (ex. ID1, ID3 and ID5 or ID2 and ID6 are duplicates). Is there any way to combine duplicate rows? (I am focusing on column1-4 and I do not care about column 5&6)

I want to combine the "Value" of the duplicate rows and leave the top column's sequence. For example, here is output file which I want to make.

    Value   ID  C1  C2  C3  C4  C5      C6
0   125     1   aa  ee  ii  mm  aaaaa   bbbb
1   52      2   bb  ff  jj  nn  cccc    ddddd
2   10      4   dd  hh  ll  pp  gggg    hhhh

If you could give me your opinion, I would be grateful for that very much.

niraj · Accepted Answer · 2018-05-18 17:46:21Z

3

There may be other efficient way, one way may be to:

Create new_df such that it keeps unique values in Column1 with first occurences.
Then, in original df getting sum after grouping by Column1 and updating the value of new_df

You can try as shown below:

new_df = df.drop_duplicates(subset='Column1', keep='first').reset_index()
del new_df['index'] # remove extra index column after reset index
new_df['Value'] = df.groupby('Column1', as_index=False).sum()['Value']
print(new_df)

Result:

   ID  Value Column1 Column2 Column3 Column4 Column5 Column6
0   1    125      aa      ee      ii      mm   aaaaa    bbbb
1   2     52      bb      ff      jj      nn    cccc   ddddd
2   4     10      dd      hh      ll      pp    gggg    hhhh

Update:

Checking with dataframe after edited:

new_df = df.drop_duplicates(subset='C1', keep='first').reset_index()
del new_df['index']
new_df['Value'] = df.groupby('C1', as_index=False).sum()['Value']
print(new_df)

Result:

   C1  C2  C3  C4     C5     C6  ID  Value
0  aa  ee  ii  mm  aaaaa   bbbb   1    125
1  bb  ff  jj  nn   cccc  ddddd   2     52
2  dd  hh  ll  pp   gggg   hhhh   4     10

edited May 18, 2018 at 17:46

answered May 18, 2018 at 17:09

niraj

18.2k4 gold badges36 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Tom_Hanks Over a year ago

Thank you so much!

jpp · Accepted Answer · 2018-05-18 16:59:49Z

1

You can use groupby.agg. I assume you wish to sum value and take the first id for each group, as in your desired output. Here's a minimal example:

df = pd.DataFrame([[100, 1, 'a', 'b'], [20, 2, 'a', 'b'],
                   [15, 3, 'c', 'd'], [5, 4, 'a', 'b'],
                   [25, 5, 'c', 'd']], columns=['value', 'id', 'col1', 'col2'])

res = df.groupby(['col1', 'col2']).agg({'id': 'first', 'value': sum}).reset_index()

print(res)

  col1 col2  id  value
0    a    b   1    125
1    c    d   3     40

answered May 18, 2018 at 16:59

jpp

166k37 gold badges301 silver badges362 bronze badges

3 Comments

Tom_Hanks Over a year ago

Thanks, jpp. However, I would like to leave the ID, column5 and column6 of the top row in the original file.

jpp Over a year ago

@Tom_Hanks, Then just add to your dictionary, e.g. 'col5': 'first', etc, if you wish to keep (any number of) other columns.

rafaelc Over a year ago

@jpp sure. Sometimes OPs are beginners and might have trouble generalizing solutions, but I think this is pretty straight forward ;}

Collectives™ on Stack Overflow

Combine value using python/pandas

2 Answers 2

Update:

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Update:

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related