I am writing a python code to drop specific rows of 'desc' col depending upon the 'label' col. I focus on 2 labels: 'L1' and 'arc'.
Some L1 labels have same 'desc' as arc labels. For such L1, I want to rename label as 'L1arc' and drop the arc row since its the duplication. I also do not want to remove all duplications in desc col though they are same description with different labels.
The dataframe looks like below:
desc label lang
0 The sky is blue L1 en
1 Design tech L2 en
2 Design tech L3 en
3 Silverline clouds PM en
4 No event data L1 en
5 TouchStatus shall be calculated L1 en
6 160 fps arc en
7 Failure detection specified L1 en
8 160 fps L1 en
9 No event data arc en
10 Design tech L1 en
Here is the code I tried:
sample.sort_values('label', ascending=False).drop_duplicates('desc').sort_index()
The problem is, above code removes duplication of other labels L2 and L3 which I want to retain, including L1 also. How to remove specific duplications in a col?
Expected output:
desc label
0 The sky is blue L1
1 Design tech L2
2 Design tech L3
3 Design tech L1
4 Silverline clouds PM
5 No event data L1arc
6 TouchStatus shall be calculated L1
7 Failure detection specified L1
8 160 fps L1arc