Python: I would like to return a subset of dataframe based on a list, if the records are ordered the same way the list is

Question

I have a dataframe that has more that a thousand records and I would like to return a sliced dataframe where the values are ordered similarly to the list.

e.g.

lst = [0,1,0,0,0,1]

Input

    date season hot_or_cold
 0   2012-01-01 Winter 0
 1   2012-01-02 Winter 1
 2   2012-01-03 Winter 0
 3   2012-01-04 Winter 0
 4   2012-01-05 Winter 0
 5   2012-01-06 Winter 1
 6   2012-01-07 Winter 1
 7   2012-01-08 Winter 1
 8   2012-01-09 Winter 0
 9   2012-01-10 Winter 1
 10   2012-01-11 Winter 0
    # 1 - hot
    # 0 - cold

Output

    date season hot_or_cold
 0   2012-01-01 Winter 0
 1   2012-01-02 Winter 1
 2   2012-01-03 Winter 0
 3   2012-01-04 Winter 0
 4   2012-01-05 Winter 0
 5   2012-01-06 Winter 1

Thank you in advance

Hello and welcome to StackOverflow! You seem to be under the impression that StackOverflow is a site where you post a problem and get some code in return. This is in fact not the case. Your question will most likely be closed or even deleted shortly. To prevent this from happening in the future, please take the tour and take a look at the help center. In particular, make yourself famlilar as to what is regarded as on-topic around here — azro
– azro, Commented May 5, 2020 at 10:32

deepak sen · Accepted Answer · 2020-05-05 11:46:29Z

0

basic question is finding some pattern in dataframe and i got this here and have implemented same.

import pandas as pd 
import numpy as np

arr = [0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1]
df = pd.DataFrame(data = arr, columns=['binary'])
pattern = [0,1, 0, 0, 0, 1]

matched = df.rolling(len(pattern)).apply(lambda x:all(np.equal(x, pattern)))
matched = matched.sum(axis = 1).astype(bool)   #Sum to perform boolean OR

idx_matched = np.where(matched)[0]
subset = [range(match-len(pattern)+1, match+1) for match in idx_matched]

result = pd.concat([df.iloc[subs,:] for subs in subset], axis = 0)

result

answered May 5, 2020 at 11:46

deepak sen

5074 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Lwazi Mkhabela Over a year ago

@deepak_sen, thank you. The link you provided me with helped me return exactly what I was looking for.

Valdi_Bo · Accepted Answer · 2020-05-05 11:28:31Z

Define 2 following functions:

Find match between s (a Series, longer) and lst (a list, shorter).

def fndMatch(s, lst):
    len1 = s.size
    len2 = len(lst)
    for i1 in range(len1 - len2 + 1):
        i2 = i1 + len2
        if s.iloc[i1:i2].eq(lst).all():
            return (i1, i2)
    return (None, None)

When a match has been found, the result is both slice borders, otherwise a pair of None values.

Get a fragment of df with hot_or_cold column matching lst:

def getFragment():
    i1, i2 = fndMatch(df.hot_or_cold, lst)
    if i1 is None:
        return None
    else:
        return df.iloc[i1:i2]

When you call it (getFragment()) the result is:

         date  season  hot_or_cold
0  2012-01-01  Winter            0
1  2012-01-02  Winter            1
2  2012-01-03  Winter            0
3  2012-01-04  Winter            0
4  2012-01-05  Winter            0
5  2012-01-06  Winter            1

Ravinder Karra · Accepted Answer · 2020-05-06 14:10:22Z

0

other way with accumulate function

from itertools import accumulate
import pandas as pd 
def accum(x):
    return list(accumulate(x))

lst = [0,1,0,0,0,1]
f = lambda x : accum([[i] for i in x])
b = df.groupby(['season'])['hot_or_cold'].apply(f)
df['col_accum2']  =  [(('Match ' if item[-len(lst):] == lst else 'NotMatch') if len(item) >= len(lst) else 'small list'  ) for subitem in b for item in subitem]

answered May 6, 2020 at 14:10

Ravinder Karra

3071 gold badge3 silver badges8 bronze badges

Collectives™ on Stack Overflow

Python: I would like to return a subset of dataframe based on a list, if the records are ordered the same way the list is

Input

Output

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Input

Output

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related