Remove rows having different consecutive values in dataframe using Pandas

Question

I have the following dataframe:

import pandas as pd
df = pd.DataFrame({"A":['a', 's', 'd', 'f', 'g', 'h', 'j', 'k', 'l'], "M":[11,4,9,2,2,5,5,6,6]})

My goal is to remove all the rows having 2 consecutive values of column M not equal to each other.

Therefore row 0, 1 and 2 should be removed because the values of M are: 11!=4, 4!=9 and 9!=2). However if 2 rows have the same consecutive value the must be kept: row 3 and 4 must be kept because they both have value 2. Same reasoning for row 5 and 6 which have value 5.

I was able to reach my goal by using the following lines of code:

l=[]
for i, row in df.iterrows():
    try:
        if df["M"].iloc[i]!=df["M"].iloc[i+1] and df["M"].iloc[i]!=df["M"].iloc[i-1]:
            l.append(i)
    except:
        pass
df = df.drop(df.index[l]).reset_index(drop=True)

Can you suggest a smarter and better way to achieve my goal? maybe by using some built-in pandas function?

Here is what the dataframe should look like:

Before: 
   A   M
0  a  11 <----Must be removed
1  s   4 <----Must be removed
2  d   9 <----Must be removed
3  f   2
4  g   2
5  h   5
6  j   5
7  k   6
8  l   6

After
   A  M
0  f  2
1  g  2
2  h  5
3  j  5
4  k  6
5  l  6

jezrael · Accepted Answer · 2017-10-03 14:06:06Z

3

Use boolean indexing with masks created by shift:

m = (df["M"].eq(df["M"].shift()) | df["M"].eq(df["M"].shift(-1)))
#alternative
#m = ~(df["M"].ne(df["M"].shift()) &  df["M"].ne(df["M"].shift(-1)))
print (df[m])
   A  M
3  f  2
4  g  2
5  h  5
6  j  5
7  k  6
8  l  6

edited Oct 3, 2017 at 14:06

answered Oct 3, 2017 at 14:00

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2017-10-03 14:25:45Z

3

By using diff

df.loc[df.M.isin(df[df.M.diff()==0].M),:]
Out[140]: 
   A  M
3  f  2
4  g  2
5  h  5
6  j  5
7  k  6
8  l  6

Notice Previous one may not work .(when 1,1,2,1,3,4)

m=df[df.M.diff()==0].index.values.tolist()
m.extend([x-1 for x in m])
df.loc[set(m)].sort_index()

Another nice answer from MaxU :

df.loc[df.M.diff().eq(0) | df.M.diff(-1).eq(0)]

edited Oct 3, 2017 at 14:25

answered Oct 3, 2017 at 14:07

BENY

324k22 gold badges176 silver badges250 bronze badges

3 Comments

MaxU - stand with Ukraine Over a year ago

what about: df.loc[df.M.diff().eq(0) | df.M.diff(-1).eq(0)]?

BENY Over a year ago

@MaxU Nice solution , add that as a answer dude ~ :)

MaxU - stand with Ukraine Over a year ago

it would be to similar to yours and jezraels... Please feel free to add it to your answer ;-)

Collectives™ on Stack Overflow

Remove rows having different consecutive values in dataframe using Pandas

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related