1

I have a rather simple question, but I can't find a clean way to do it. I would like to delete a number of rows from my dataframe, based on their value in a specific column (id), but I only want to delete one occurrence at a time (preferably random). Here is an example:

I have the following list of ids, that I want to delete:

idsToDelete = [1,2,2,3,3]

In other words, I would like to delete one random row with id = 1, two random rows with id 2 and two random rows with id 3.

I have the follwoing dataframe:

list1 = np.array([[1,0],[1,0],[2,0],[2,0],[2,0],[2,0],[3,0],[3,0],[3,0]])
df = pd.DataFrame(list1, columns=["id","class"])
id | class
------ | ------ 
1 | 0
1 | 0
2 | 0
2 | 0
2 | 0
2 | 0
3 | 0
3 | 0
3 | 0

My goal is to get this dataframe:

id | class
------ | ------ 
1 | 0
2 | 0
2 | 0
3 | 0

Any ideas?

1
  • No I do not want to delete duplicates, I would like to delete one random row with id 1, 3 random rows with id 2 and 2 random rows with id 3. Hypothetically there could be duplicates in the output. I change the example to make it clearer. Commented Aug 25, 2017 at 14:36

1 Answer 1

1

This works, but it is not random:

for currentID in idsToDelete:
    df = df.drop(df[df.id == currentID].index[0])
Sign up to request clarification or add additional context in comments.

3 Comments

And happens to be iterative too. yeesh.
You could turn this around - pass a list of ids you want to keep. There is a vectorised solution for that.
you can use the "sample" method to shuffle the dataframe: sample(frac=1)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.