Fill the values between two pandas column values with same values

Question

I have a data frame like this,

df1
col1    col2
 1        A
 2        A
 3        A
 4        B
 5        A
 6        A
 7        B
 8        A
 9        A
10        A
11        C
12        C
13        A
14        A
15        C
16        A
17        C

In above data frame total number of B and C are always even. Now I want to fill all the values between two B and C with B and C.

So the final data frame should look like,

df1
col1    col2
 1        A
 2        A
 3        A
 4        B
 5        B
 6        B
 7        B
 8        A
 9        A
10        A
11        C
12        C
13        A
14        A
15        C
16        C
17        C

I could do it using a for loop, but the execution time will be huge, I am looking for some pandas shortcut / pythonic way to do it.

Why does 16 become C but 13 and 14 do not? What are the rules exactly? Can you write a for loop that implements exactly what you need, then we can optimize that? — John Zwinck
– John Zwinck, Commented Nov 16, 2019 at 7:54
Interesting, do you see how you never mentioned that requirement in the question? Can you provide a simple, maybe slow but correct, for loop that does it? — John Zwinck
– John Zwinck, Commented Nov 16, 2019 at 7:58

jezrael · Accepted Answer · 2019-11-16 08:40:20Z

1

Idea is filter out consecutive B or C values, then replace all another B or C to missing values. Then forward filling missing values but keep only values same like backfilling, last replace all another values to original with Series.fillna:

for v in ['B','C']:
    m1 = df['col2'].eq(v)
    m2 = m1.ne(m1.shift()).cumsum().duplicated(keep=False)
    s = df['col2'].where(m1 & ~m2)
    ff = s.ffill()
    df['col2'] = ff.where(ff == s.bfill()).fillna(df['col2'])
print (df)
    col1 col2
0      1    A
1      2    A
2      3    A
3      4    B
4      5    B
5      6    B
6      7    B
7      8    A
8      9    A
9     10    A
10    11    C
11    12    C
12    13    A
13    14    A
14    15    C
15    16    C
16    17    C

edited Nov 16, 2019 at 8:40

answered Nov 16, 2019 at 8:33

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ansev Over a year ago

this is a bit cumbersome

ansev · Accepted Answer · 2019-11-16 10:58:30Z

1

You only need to select when the cumulative sum Series.cumsum is odd + Series.mask:

for l in ['B','C']:
    mask=(df.col2.eq(l).cumsum()%2)==1
    df['col2']=df['col2'].mask(mask,l)
print(df)

    col1 col2
0     1    A 
1     2    A 
2     3    A 
3     4    B 
4     5    B 
5     6    B 
6     7    B 
7     8    A 
8     9    A 
9    10    A 
10   11    C 
11   12    C 
12   13    A 
13   14    A 
14   15    C 
15   16    C 
16   17    C

answered Nov 16, 2019 at 10:58

ansev

31k5 gold badges21 silver badges33 bronze badges

Collectives™ on Stack Overflow

Fill the values between two pandas column values with same values

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related