0

I am trying to segment a list of list into blocks based on the type of data in the sublist's second index (namely NoneType or int).

Example Data:

arr = [
[81, None, None],
[82, None, None],
[83, None, None],
[84, None, None],
[85, 161, 360],
[86, 161, 360],
[87, 161, 360],
[88, 160, 360],
[89, 160, 360],
[90, 160, 360],
[91, 160, 360],
[92, 160, 360],
[93, None, None],
[94, None, None],
[95, None, None],
[96, 153, 359],
[97, 153, 359],
[98, 153, 359],
[99, 153, 359]]

This can be treated as a list of lists, as I said, or as a numpy array (i.e., numpy.array(arr)). Whichever is easier.

I am trying for something like this (doesn't need to be identical):

[(81, 84, None),                   # or [[None, None], [None, None]...] ... either is fine.
 (85, 93, [[161, 360], [161, 360]]...),
 (93, 95, None),
 (96, 99, [[153, 359], [153, 359]]...)
]

Sloppy attempt:

none_end = 0
none_start = False
blocks_loc = list()
for i in arr:
    if None in i:
        if not none_start:
            none_start = i[0]
        none_end = i[0]
    elif None not in i and none_start is not False:
        blocks_loc.append((none_start, none_end))
        none_start = False
        none_end = 0

Then I could simply pull out the data for based on blocks_loc (which now contains [(81, 84, (93, 95)]).

However, it is hard to put into words just how terrible and ugly that code is. Something better would be great.

1 Answer 1

1

I might use itertools.groupby:

from itertools import groupby
groups = (list(g) for k,g in groupby(arr, key=lambda x: x[1]))
final = [(g[0][0], g[-1][0], [x[1:] for x in g]) for g in groups]

which gives me

>>> pprint.pprint(final)
[(81, 84, [[None, None], [None, None], [None, None], [None, None]]),
 (85, 87, [[161, 360], [161, 360], [161, 360]]),
 (88, 92, [[160, 360], [160, 360], [160, 360], [160, 360], [160, 360]]),
 (93, 95, [[None, None], [None, None], [None, None]]),
 (96, 99, [[153, 359], [153, 359], [153, 359], [153, 359]])]

.. and I just noticed that I was using x[1] as the index to group on, and you want x[2] instead. Well, that's left as an exercise for the reader. ;-)

If you wanted finer control over the output (e.g. to handle the case where the start and end indices are the same), it'd be easier just to loop over the key/group pairs returned by groupby and yield whatever you like.

Also note that groupby finds contiguous groups. If your data is not necessarily contiguous, you could sort first instead.

Sign up to request clarification or add additional context in comments.

3 Comments

not quite there, your grouping doesn't handle the int data case right, you need to change the groupby key lambda to key=lambda x: type(x[1])
@f5r5e5d: sure, the OP can pick whatever condition he wants to use as a keyfunc.
I did end up editing it a little, but it got me very close. Many thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.