1

I have a dataframe like given bellow

date,value
2/10/19,34
2/11/19,34
2/12/19,34
2/13/19,34
2/14/19,34
2/15/19,34
2/16/19,34
2/17/19,0
2/18/19,0
2/19/19,0
2/20/19,22
2/21/19,22
2/22/19,22
2/23/19,22
2/24/19,0
2/25/19,0
2/26/19,0
2/27/19,0
2/28/19,1
3/1/19,2
3/2/19,2
3/3/19,1
3/4/19,0
3/5/19,0
3/6/19,0
3/7/19,3
3/8/19,3
3/9/19,3
3/10/19,0

After every interval dataframe has zero values, I want to group rows in such a way that if zero appears more two times continuously it should create a sub dataframe and save a file.

Output:


df1 
    2/17/19,0
    2/18/19,0
    2/19/19,0
df2
    2/24/19,0
    2/25/19,0
    2/26/19,0
    2/27/19,0
df3
    3/4/19,0
    3/5/19,0
    3/6/19,0

I tried many ways to do it but it fails.

Thank you.

1
  • Do you want to keep only the rows with zeros? Commented Mar 26, 2020 at 15:25

2 Answers 2

1

You can try using rolling:

def merge_intervals(intervals):
    sorted_intervals = sorted(intervals, key=lambda x: x[0])
    interval_index = 0
    #print(sorted_intervals)
    for  i in sorted_intervals:

        if i[0] > sorted_intervals[interval_index][1]:
            interval_index += 1
            sorted_intervals[interval_index] = i
        else:
            sorted_intervals[interval_index] = [sorted_intervals[interval_index][0], i[1]]
    #print(sorted_intervals)
    return sorted_intervals[:interval_index+1]

end_ids = df[df['value'].rolling(3).apply(lambda x: (x==0).all())==1].index

start_ids = end_ids-3

intervals = merge_intervals([*zip(starts_ids, end_ids)])

for i,interval in enumerate(intervals):
    df[interval[0]+1:interval[1]+1].to_csv('df_' + str(i) + '.csv')

Not the prettiest code, but it works, the merge function was found here: Merging Overlapping Intervals in Python

Sign up to request clarification or add additional context in comments.

3 Comments

I think code is close enough, but if I understand correctly "rolling(3)" creating some trouble and generating dataframe with 3 rows only and has common rows in different dataframes. I have to understand this code and modify accordingly. Although it's helpful. thank you.
Sorry, I didn't see you wanted >=3, I fixed the code to work that way.
Hey this what I was trying for a couple of hours thank you appreciated.
0

Find where values are equal to zero and take a rolling sum of length 3. Find where the rolling sums are equal to 3. The result will be lagged by 2 spaces so we take the logical or of the result with the -1 shifted and -2 shifted versions of the result.

mask = df['value'].eq(0).rolling(3).sum().eq(3)
mask |= mask.shift(-2) | mask.shift(-1)

In order to get groups, I take the cumulative sum of the logical negation. That will increment for each non-zero value and stagnate at the zeros. However, each group of zeros will be different. At the point I use groupby, that won't matter because I will have used the initial mask to see only the rows that satisfied the condition in the first place.

However, the resulting groups will be a non contiguous set of integers. Because I don't like that, I used factorize to give these groups unique integer values starting from zero.

grp_masked = (~mask).cumsum()[mask].factorize()[0]
g = df[mask].groupby(grp_masked)

Save files

for grp, d in g:
    d.to_csv(f'df_{grp}.csv', index=False)

Create a dictionary

df_dict = {grp: d for grp, d in g}

Details

This shows the original dataframe along with additional columns that show some of what we calculated.

group_series = pd.Series(
    grp_masked, df.index[mask], pd.Int64Dtype()
)

df_ = df.assign(
    EqZero=df['value'].eq(0),
    Roll2=df['value'].eq(0).rolling(3).sum(),
    Is3=df['value'].eq(0).rolling(3).sum().eq(3),
    Shift=lambda d: d.Is3.shift(-2) | d.Is3.shift(-1),
    Mask=mask,
    PreGrp=(~mask).cumsum(),
    Grp=group_series
)

df_

       date  value  EqZero  Roll2    Is3  Shift   Mask  PreGrp   Grp
0   2/10/19     34   False    NaN  False  False  False       1  <NA>
1   2/11/19      0    True    NaN  False  False  False       2  <NA>
2   2/12/19      0    True    2.0  False  False  False       3  <NA>
3   2/13/19     34   False    2.0  False  False  False       4  <NA>
4   2/14/19     34   False    1.0  False  False  False       5  <NA>
5   2/15/19     34   False    0.0  False  False  False       6  <NA>
6   2/16/19     34   False    0.0  False  False  False       7  <NA>
7   2/17/19      0    True    1.0  False   True   True       7     0
8   2/18/19      0    True    2.0  False   True   True       7     0
9   2/19/19      0    True    3.0   True  False   True       7     0
10  2/20/19     22   False    2.0  False  False  False       8  <NA>
11  2/21/19     22   False    1.0  False  False  False       9  <NA>
12  2/22/19     22   False    0.0  False  False  False      10  <NA>
13  2/23/19     22   False    0.0  False  False  False      11  <NA>
14  2/24/19      0    True    1.0  False   True   True      11     1
15  2/25/19      0    True    2.0  False   True   True      11     1
16  2/26/19      0    True    3.0   True   True   True      11     1
17  2/27/19      0    True    3.0   True  False   True      11     1
18  2/28/19      1   False    2.0  False  False  False      12  <NA>
19   3/1/19      2   False    1.0  False  False  False      13  <NA>
20   3/2/19      2   False    0.0  False  False  False      14  <NA>
21   3/3/19      1   False    0.0  False  False  False      15  <NA>
22   3/4/19      0    True    1.0  False   True   True      15     2
23   3/5/19      0    True    2.0  False   True   True      15     2
24   3/6/19      0    True    3.0   True  False   True      15     2
25   3/7/19      3   False    2.0  False  False  False      16  <NA>
26   3/8/19      3   False    1.0  False  False  False      17  <NA>
27   3/9/19      3   False    0.0  False  False  False      18  <NA>
28  3/10/19      0    True    1.0  False  False  False      19  <NA>

1 Comment

Very precise code and great explanation thank you for sharing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.