2

I have the following DataFrame:

df = pd.DataFrame({'index':[0,1,2,3,4,5,6,7,8,9,10], 'X':[0,0,1,1,0,0,1,1,1,0,0]})
df.set_index('index', inplace = True)

   X
index   
0      0
1      0
2      1
3      1
4      0
5      0
6      1
7      1
8      1
9      0
10     0

What I need is to return a list of tuples showing the index value for the first and last instances of the 1s for each sequence of 1s (sorry if that's confusing). i.e.

Want:

[(2,3), (6,8)]

The first instance of the first 1 occurs at index point 2, then the last 1 in that sequence occurs at index point 3. The next 1 occurs at index point 6, and the last 1 in that sequence occurs at index point 8.

What I've tried:

I can grab the first one using numpy's argmax function. i.e.

x1 = np.argmax(df.values)
y1 = np.argmin(df.values[x1:])
(x1,2 + y1 - 1)

Which will give me the first tuple, but iterating through seems messy and I feel like there's a better way.

3 Answers 3

2

You need more_itertools.consecutive_groups

import more_itertools as mit
def find_ranges(iterable):
    """Yield range of consecutive numbers."""
    for group in mit.consecutive_groups(iterable):
        group = list(group)
        if len(group) == 1:
            yield group[0]
        else:
            yield group[0], group[-1]
list(find_ranges(df['X'][df['X']==1].index))

Output:

[(2, 3), (6, 8)]
Sign up to request clarification or add additional context in comments.

Comments

2

You can use a third party library: more_itertools

loc with mit.consecutive_groups

[list(group) for group in mit.consecutive_groups(df.loc[df.ones == 1].index)]

# [[2, 3], [6, 7, 8]]

Simple list comprehension:

x = [(i[0], i[-1]) for i in x]

#  [(2, 3), (6, 8)]

An approach using numpy, adapted from a great answer by @Warren Weckesser

def runs(a):
    isone = np.concatenate(([0], np.equal(a, 1).view(np.int8), [0]))
    absdiff = np.abs(np.diff(isone))
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
    return [(i, j-1) for i, j in ranges]

runs(df.ones.values)
# [(2, 3), (6, 8)]

Comments

1

Here's a pure pandas solution:

df.groupby(df['X'].eq(0).cumsum().mask(df['X'].eq(0)))\
  .apply(lambda x: (x.first_valid_index(),x.last_valid_index()))\
  .tolist()

Output:

[(2, 3), (6, 8)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.