I need to modify a python pandas dataframe. Consider
Id Col
1 a
2 a
3 p
3 sp
4 n
4 sn
5 b
6 c
is my dataframe. Ids 3 and 4 appear twice. For rows having Id 3, Col has values p and sp. Similarly for Id 4 we see values n and sn in Col. I want to remove row having Col as p for Id 3 and row having Col as n for Id 4. So i wnat my dataframe to look like
Id Col
1 a
2 a
3 sp
4 sn
5 b
6 c
so basically, here is what i need to do
Check if ther are any duplicates. Lets assume that the duplicates only occur in pairs and not in triples or more.
Then if the value of the Col is same, then we keep only one such row.
- If the values in the Col are p and sp, i want to keep the row that has sp.
- If the values in the Col are n and sn, i want to keep the row that has sn.
how can i achieve this?
EDIT
actually, ideally i would need to check before deciding which row to drop. Lets say i know that there are multiple rows with Id 3 and the corresponding values of Col are
p
sp
now i want to collect these values in a list as
['p','sp']
and send it to a function like
def giveMeBest(paramList):
bestVal = ""
for param in paramList:
'''
some logic goes here
'''
return bestVal
then i only keep the row which has value bestVal in Col. Note that this will also allow me to handle any number of duplicates.
EDIT2
Thanks rurp for the answer. I just one last request. I am trying to clean up my data frame by doing the following
for x in result:
resVal = getVal(x[1])
'''
getVal returns the appropriate value that i want to be set in
my dataframe. Note that x[1] will denote the array of duplicate values in Col
'''
resData = resData[(resData.Id == x[0]) & (resData.Col!=resVal)]
but this still does not delete the rows
print(resData[resData.Id==3])
Id Col
3 p
3 sp
i even tried
resData = resData.drop(resData[(resData.Id == int(x[0])) & (resData.Col!=resSent)].index)
but it still shows the duplicate row.
how can i drop multiple rows from my data frame ?
Solved dropping rows
here is how i did it
idx = []
for x in result:
resVal = getVal(x[1])
idx.append(resData[(resData.Id == x[0]) & (resData.Col!= resVal)].index.tolist())
and then, just
for j in idx:
resData = resData.drop(j)