Pandas: return index values for first instance and last instance of value

Question

I have the following DataFrame:

df = pd.DataFrame({'index':[0,1,2,3,4,5,6,7,8,9,10], 'X':[0,0,1,1,0,0,1,1,1,0,0]})
df.set_index('index', inplace = True)

   X
index   
0      0
1      0
2      1
3      1
4      0
5      0
6      1
7      1
8      1
9      0
10     0

What I need is to return a list of tuples showing the index value for the first and last instances of the 1s for each sequence of 1s (sorry if that's confusing). i.e.

Want:

[(2,3), (6,8)]

The first instance of the first 1 occurs at index point 2, then the last 1 in that sequence occurs at index point 3. The next 1 occurs at index point 6, and the last 1 in that sequence occurs at index point 8.

What I've tried:

I can grab the first one using numpy's argmax function. i.e.

x1 = np.argmax(df.values)
y1 = np.argmin(df.values[x1:])
(x1,2 + y1 - 1)

Which will give me the first tuple, but iterating through seems messy and I feel like there's a better way.

harpan · Accepted Answer · 2018-05-24 20:50:13Z

2

You need more_itertools.consecutive_groups

import more_itertools as mit
def find_ranges(iterable):
    """Yield range of consecutive numbers."""
    for group in mit.consecutive_groups(iterable):
        group = list(group)
        if len(group) == 1:
            yield group[0]
        else:
            yield group[0], group[-1]
list(find_ranges(df['X'][df['X']==1].index))

Output:

[(2, 3), (6, 8)]

answered May 24, 2018 at 20:50

harpan

8,6412 gold badges22 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user3483203 · Accepted Answer · 2018-05-24 21:22:07Z

2

You can use a third party library: more_itertools

loc with mit.consecutive_groups

[list(group) for group in mit.consecutive_groups(df.loc[df.ones == 1].index)]

# [[2, 3], [6, 7, 8]]

Simple list comprehension:

x = [(i[0], i[-1]) for i in x]

#  [(2, 3), (6, 8)]

An approach using numpy, adapted from a great answer by @Warren Weckesser

def runs(a):
    isone = np.concatenate(([0], np.equal(a, 1).view(np.int8), [0]))
    absdiff = np.abs(np.diff(isone))
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
    return [(i, j-1) for i, j in ranges]

runs(df.ones.values)
# [(2, 3), (6, 8)]

edited May 24, 2018 at 21:22

answered May 24, 2018 at 20:54

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Comments

Scott Boston · Accepted Answer · 2018-05-24 23:21:40Z

1

Here's a pure pandas solution:

df.groupby(df['X'].eq(0).cumsum().mask(df['X'].eq(0)))\
  .apply(lambda x: (x.first_valid_index(),x.last_valid_index()))\
  .tolist()

Output:

[(2, 3), (6, 8)]

answered May 24, 2018 at 23:21

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Collectives™ on Stack Overflow

Pandas: return index values for first instance and last instance of value

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related