Get Duplicated Rows in Dataframe and Overwrite them Python

Question

I have the following Dataframe:

index	errorId	start	end	timestamp	uniqueId
0	1404	2022-04-25 02:10:41	2022-04-25 02:10:46	2022-04-25	1404_2022-04-25
1	1302	2022-04-25 02:10:41	2022-04-25 02:10:46	2022-04-25	1302_2022-04-25
2	1404	2022-04-27 12:54:46	2022-04-27 12:54:51	2022-04-25	1404_2022-04-25
3	1302	2022-04-27 13:34:43	2022-04-27 13:34:50	2022-04-25	1302_2022-04-25
4	1404	2022-04-29 04:30:22	2022-04-29 04:30:29	2022-04-25	1404_2022-04-25
5	1302	2022-04-29 08:26:25	2022-04-29 08:26:32	2022-04-25	1302_2022-04-25

The unique_ID is a combination from the column errorId and uniqueId. It should be checked whether the column 'uniqueID' contains a duplicate value. If this is the case, the row should be taken where it appears for the first time. In the example for errorId 1404, it would be the column at index 0. Afterwards, the value in the column 'end' should be overwritten with the value where it appears for the last time. In the example here, at index 4.
The same for errorId 1302

In the End it should look like this:

index	errorId	start	end	timestamp	uniqueId
0	1404	2022-04-25 02:10:41	2022-04-29 04:30:29	2022-04-25	1404_2022-04-25
1	1302	2022-04-25 02:10:41	2022-04-29 08:26:32	2022-04-25	1302_2022-04-25

jezrael · Accepted Answer · 2023-01-09 10:00:17Z

2

I think you need aggragate min and max per 3 columns with named aggregation, last for same order of columns like original add DataFrame.reindex:

df1 = (df.groupby(['errorId','timestamp','uniqueId'], as_index=False, sort=False)
         .agg(start=('start','min'), end=('end','max'))
         .reindex(df.columns, axis=1))

Or aggregate by first and last, if datetimes are sorted per groups get same ouput:

df2 = (df.groupby(['errorId','timestamp','uniqueId'], as_index=False, sort=False)
         .agg(start=('start','first'), end=('end','last'))
         .reindex(df.columns, axis=1))

edited Jan 9, 2023 at 10:00

answered Jan 9, 2023 at 9:55

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Get Duplicated Rows in Dataframe and Overwrite them Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related