I have the following Dataframe:
| index | errorId | start | end | timestamp | uniqueId |
|---|---|---|---|---|---|
| 0 | 1404 | 2022-04-25 02:10:41 | 2022-04-25 02:10:46 | 2022-04-25 | 1404_2022-04-25 |
| 1 | 1302 | 2022-04-25 02:10:41 | 2022-04-25 02:10:46 | 2022-04-25 | 1302_2022-04-25 |
| 2 | 1404 | 2022-04-27 12:54:46 | 2022-04-27 12:54:51 | 2022-04-25 | 1404_2022-04-25 |
| 3 | 1302 | 2022-04-27 13:34:43 | 2022-04-27 13:34:50 | 2022-04-25 | 1302_2022-04-25 |
| 4 | 1404 | 2022-04-29 04:30:22 | 2022-04-29 04:30:29 | 2022-04-25 | 1404_2022-04-25 |
| 5 | 1302 | 2022-04-29 08:26:25 | 2022-04-29 08:26:32 | 2022-04-25 | 1302_2022-04-25 |
The unique_ID is a combination from the column errorId and uniqueId.
It should be checked whether the column 'uniqueID' contains a duplicate value. If this is the case, the row should be taken where it appears for the first time. In the example for errorId 1404, it would be the column at index 0. Afterwards, the value in the column 'end' should be overwritten with the value where it appears for the last time. In the example here, at index 4.
The same for errorId 1302
In the End it should look like this:
| index | errorId | start | end | timestamp | uniqueId |
|---|---|---|---|---|---|
| 0 | 1404 | 2022-04-25 02:10:41 | 2022-04-29 04:30:29 | 2022-04-25 | 1404_2022-04-25 |
| 1 | 1302 | 2022-04-25 02:10:41 | 2022-04-29 08:26:32 | 2022-04-25 | 1302_2022-04-25 |