0

I have a dataframe that has more that a thousand records and I would like to return a sliced dataframe where the values are ordered similarly to the list.

e.g.

lst = [0,1,0,0,0,1]

Input

    date season hot_or_cold
 0   2012-01-01 Winter 0
 1   2012-01-02 Winter 1
 2   2012-01-03 Winter 0
 3   2012-01-04 Winter 0
 4   2012-01-05 Winter 0
 5   2012-01-06 Winter 1
 6   2012-01-07 Winter 1
 7   2012-01-08 Winter 1
 8   2012-01-09 Winter 0
 9   2012-01-10 Winter 1
 10   2012-01-11 Winter 0
    # 1 - hot
    # 0 - cold

Output

    date season hot_or_cold
 0   2012-01-01 Winter 0
 1   2012-01-02 Winter 1
 2   2012-01-03 Winter 0
 3   2012-01-04 Winter 0
 4   2012-01-05 Winter 0
 5   2012-01-06 Winter 1

Thank you in advance

2
  • 1
    Hello and welcome to StackOverflow! You seem to be under the impression that StackOverflow is a site where you post a problem and get some code in return. This is in fact not the case. Your question will most likely be closed or even deleted shortly. To prevent this from happening in the future, please take the tour and take a look at the help center. In particular, make yourself famlilar as to what is regarded as on-topic around here Commented May 5, 2020 at 10:32
  • Sorry if the way the tables turned out. Commented May 5, 2020 at 10:32

3 Answers 3

0

basic question is finding some pattern in dataframe and i got this here and have implemented same.

import pandas as pd 
import numpy as np

arr = [0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1]
df = pd.DataFrame(data = arr, columns=['binary'])
pattern = [0,1, 0, 0, 0, 1]

matched = df.rolling(len(pattern)).apply(lambda x:all(np.equal(x, pattern)))
matched = matched.sum(axis = 1).astype(bool)   #Sum to perform boolean OR

idx_matched = np.where(matched)[0]
subset = [range(match-len(pattern)+1, match+1) for match in idx_matched]

result = pd.concat([df.iloc[subs,:] for subs in subset], axis = 0)

result
Sign up to request clarification or add additional context in comments.

1 Comment

@deepak_sen, thank you. The link you provided me with helped me return exactly what I was looking for.
0

Define 2 following functions:

  1. Find match between s (a Series, longer) and lst (a list, shorter).

    def fndMatch(s, lst):
        len1 = s.size
        len2 = len(lst)
        for i1 in range(len1 - len2 + 1):
            i2 = i1 + len2
            if s.iloc[i1:i2].eq(lst).all():
                return (i1, i2)
        return (None, None) 
    

    When a match has been found, the result is both slice borders, otherwise a pair of None values.

  2. Get a fragment of df with hot_or_cold column matching lst:

    def getFragment():
        i1, i2 = fndMatch(df.hot_or_cold, lst)
        if i1 is None:
            return None
        else:
            return df.iloc[i1:i2]
    

When you call it (getFragment()) the result is:

         date  season  hot_or_cold
0  2012-01-01  Winter            0
1  2012-01-02  Winter            1
2  2012-01-03  Winter            0
3  2012-01-04  Winter            0
4  2012-01-05  Winter            0
5  2012-01-06  Winter            1

Comments

0

other way with accumulate function

from itertools import accumulate
import pandas as pd 
def accum(x):
    return list(accumulate(x))

lst = [0,1,0,0,0,1]
f = lambda x : accum([[i] for i in x])
b = df.groupby(['season'])['hot_or_cold'].apply(f)
df['col_accum2']  =  [(('Match ' if item[-len(lst):] == lst else 'NotMatch') if len(item) >= len(lst) else 'small list'  ) for subitem in b for item in subitem]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.