1

I have a dataframe like this:

id year data_1 data_2
A 2019 nan 11
A 2019 abc 11
A 2020 nan 22
B 2019 345 nan
B 2019 nan 456
B 2020 234 33

I want to identify duplicated rows based on some columns ("id" and "year" in this case) and merge the rest columns of them i.e. for a columns of an id at a year, keep the non-np.nan value:

id year data_1 data_2
A 2019 abc 11
A 2020 nan 22
B 2019 345 456
B 2020 234 33

I can find all duplicated rows (which is easy) but can't think of how to "merge" by replacing np.nan with values.

3
  • @timgeb no there can be more. Commented Aug 23, 2021 at 7:52
  • okay, understood Commented Aug 23, 2021 at 7:54
  • 1
    @timgeb Ha sorry, if you mean for each id, year, and column, then yes there is always at most one non-nan value. Actually, there will only be two duplicated rows. So there can't be more than 1 non-nan value for each column Commented Aug 23, 2021 at 7:56

1 Answer 1

2

Something that will work in this particular case is taking the max per group:

df.groupby(['id', 'year'], as_index=False).max()

output:

  id  year  data_1  data_2
0  A  2019   123.0    11.0
1  A  2020     NaN    22.0
2  B  2019   345.0   456.0
3  B  2020   234.0    33.0

However, this might not if you have duplicates without NaNs, in this case please provide an updated example and the rules for merging.

Here is a quick fix of the above for mixed types. Convert to string, do the merge, convert back to float. However, mixed types in a single column is not really good practice.

(df.fillna('').astype(str)
   .groupby(['id', 'year'], as_index=False).max()
   .astype(float, errors='ignore')
   .replace('', float('nan'))
)
Sign up to request clarification or add additional context in comments.

7 Comments

What if there are non-numerical values like string?
please provide an example and the expected output
@GrumpyCivet I provided a fix. Do you really have mixed strings and floats in the same column?
OK, then fillna with empty string (if not an issue) in the string columns and the first answer will work
You mean you want to keep the same rows and ffill? If this doesn't help, please open an new question as this is a different problem
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.