I have a dataframe like this:
| id | year | data_1 | data_2 |
|---|---|---|---|
| A | 2019 | nan | 11 |
| A | 2019 | abc | 11 |
| A | 2020 | nan | 22 |
| B | 2019 | 345 | nan |
| B | 2019 | nan | 456 |
| B | 2020 | 234 | 33 |
I want to identify duplicated rows based on some columns ("id" and "year" in this case) and merge the rest columns of them i.e. for a columns of an id at a year, keep the non-np.nan value:
| id | year | data_1 | data_2 |
|---|---|---|---|
| A | 2019 | abc | 11 |
| A | 2020 | nan | 22 |
| B | 2019 | 345 | 456 |
| B | 2020 | 234 | 33 |
I can find all duplicated rows (which is easy) but can't think of how to "merge" by replacing np.nan with values.