Efficient way to Segment list of list (or array) by second index

Question

I am trying to segment a list of list into blocks based on the type of data in the sublist's second index (namely NoneType or int).

Example Data:

arr = [
[81, None, None],
[82, None, None],
[83, None, None],
[84, None, None],
[85, 161, 360],
[86, 161, 360],
[87, 161, 360],
[88, 160, 360],
[89, 160, 360],
[90, 160, 360],
[91, 160, 360],
[92, 160, 360],
[93, None, None],
[94, None, None],
[95, None, None],
[96, 153, 359],
[97, 153, 359],
[98, 153, 359],
[99, 153, 359]]

This can be treated as a list of lists, as I said, or as a numpy array (i.e., numpy.array(arr)). Whichever is easier.

I am trying for something like this (doesn't need to be identical):

[(81, 84, None),                   # or [[None, None], [None, None]...] ... either is fine.
 (85, 93, [[161, 360], [161, 360]]...),
 (93, 95, None),
 (96, 99, [[153, 359], [153, 359]]...)
]

Sloppy attempt:

none_end = 0
none_start = False
blocks_loc = list()
for i in arr:
    if None in i:
        if not none_start:
            none_start = i[0]
        none_end = i[0]
    elif None not in i and none_start is not False:
        blocks_loc.append((none_start, none_end))
        none_start = False
        none_end = 0

Then I could simply pull out the data for based on blocks_loc (which now contains [(81, 84, (93, 95)]).

However, it is hard to put into words just how terrible and ugly that code is. Something better would be great.

DSM · Accepted Answer · 2017-01-15 02:20:47Z

1

I might use itertools.groupby:

from itertools import groupby
groups = (list(g) for k,g in groupby(arr, key=lambda x: x[1]))
final = [(g[0][0], g[-1][0], [x[1:] for x in g]) for g in groups]

which gives me

>>> pprint.pprint(final)
[(81, 84, [[None, None], [None, None], [None, None], [None, None]]),
 (85, 87, [[161, 360], [161, 360], [161, 360]]),
 (88, 92, [[160, 360], [160, 360], [160, 360], [160, 360], [160, 360]]),
 (93, 95, [[None, None], [None, None], [None, None]]),
 (96, 99, [[153, 359], [153, 359], [153, 359], [153, 359]])]

.. and I just noticed that I was using x[1] as the index to group on, and you want x[2] instead. Well, that's left as an exercise for the reader. ;-)

If you wanted finer control over the output (e.g. to handle the case where the start and end indices are the same), it'd be easier just to loop over the key/group pairs returned by groupby and yield whatever you like.

Also note that groupby finds contiguous groups. If your data is not necessarily contiguous, you could sort first instead.

answered Jan 15, 2017 at 2:20

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

f5r5e5d Over a year ago

not quite there, your grouping doesn't handle the int data case right, you need to change the groupby key lambda to key=lambda x: type(x[1])

DSM Over a year ago

@f5r5e5d: sure, the OP can pick whatever condition he wants to use as a keyfunc.

lnNoam Over a year ago

I did end up editing it a little, but it got me very close. Many thanks.

Collectives™ on Stack Overflow

Efficient way to Segment list of list (or array) by second index

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related