Good day,
I need a way to check each row of a dataframe and drop the row if all the values in that row (across the score columns) are the same. The person_id may differ.
Here is a part of the dataset:
In:
data = [[7, 10, 10, 10, 10], [17, 10, 10, 10, 10], [18, 8, 10, 10, 10], [20, 10, 10, 9, 9], [25, 9, 8, 8, 7]]
df = pd.DataFrame(data, columns = ['person_id', 'score_1', 'score_2', 'score_3', 'score_4'])
df
Out:
person_id score_1 score_2 score_3 score_4
0 7 10 10 10 10
1 17 10 10 10 10
2 18 8 10 10 10
3 20 10 10 9 9
4 25 9 8 8 7
The desired output would be:
person_id score_1 score_2 score_3 score_4
2 18 8 10 10 10
3 20 10 10 9 9
4 25 9 8 8 7
Since row 0 (person_id 7) and row 1 (person_id 17) have the same scores.
The number of columns will also change, adding more score columns - thus, I cannot use
df_no_duplicates = df.loc[(df.score_1 != df.score_2) | (df.score_2 != df.score_3)| (df.score_3 != df.score_4)]