2

I am looking for a way to delete rows in a pandas DataFrame when the index is not guaranteed to be unique.

So, I want to drop items 0 and 4 from my DataFrame df. This would be the typical code you would use to do that:

df.drop([0, 4].index)

If each index is unique, this works fine. However, if items 0, 1, and 2 all have the same index, this code drops items 0, 1, 2, and 4, instead of just 0 and 4.

My DataFrame is set up this way for good reasons, so I don't want to restructure my data, which looks approximately like this:

        age
site             
mc03    0.39
mc03    0.348
mc03    0.348
mc03    0.42
mc04    0.78

I tried:

del df.iloc[0]

but this fails with:

AttributeError: __delitem__

Any other suggestions for how to accomplish this task?

Update:

I found two ways to do it, but neither is particularly elegant.

to_drop = [0, 4]
df = df.iloc[sorted(set(range(len(df))) - set(to_drop))]
# or:
df = df.iloc[[i for i in range(len(df)) if i not in to_drop]]

Maybe this is as good as it's going to get, though?

2 Answers 2

4

This is not very elegant too, but let me post it as an alternative:

df = df.reset_index().drop([0, 4]).set_index("site")

It temporarily changes the index to a regular index, drops the rows and sets the original index back. The idea is from this answer.

Sign up to request clarification or add additional context in comments.

Comments

0

alternative solution (using numpy):

In [252]: mask = np.ones(len(df)).astype(bool)

In [253]: mask[[0,4]] = False

In [254]: mask
Out[254]: array([False,  True,  True,  True, False], dtype=bool)

In [255]: df[mask]
Out[255]:
        age
mc03  0.348
mc03  0.348
mc03  0.420

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.