6

I have a list of values that are the result of merging many files. I need to pad some of the values. I know that each sub-section begins with the value -1. I am trying to basically extract a sub-array between -1's in the main array via iteration.

For example supposed this is the main list:

-1 1 2 3 4 5 7 -1 4 4 4 5 6 7 7 8 -1 0 2 3 5 -1

I would like to extract the values between the -1s:

list_a = 1 2 3 4 5 7
list_b = 4 4 4 5 6 7 7 8
list_c = 0 2 3 5 ...
list_n = a1 a2 a3 ... aM

I have extracted the indices for each -1 by searching through the main list:

minus_ones = [i for i, j in izip(count(), q) if j == -1]

I also assembled them as pairs using a common recipe:

def pairwise(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a,b)

for index in pairwise(minus_ones):
    print index

The next step I am trying to do is grab the values between the index pairs, for example:

 list_b: (7 , 16) -> 4 4 4 5 6 7 7 8 

so I can then do some work to those values (I will add a fixed int. to each value in each sub-array).

1
  • 2
    There should be a way to build the list of lists while merging the files thus avoiding this problem in the first place. Commented Jan 30, 2014 at 23:13

4 Answers 4

4

You mentioned numpy in the tags. If you're using it, have a look at np.split.

For example:

import numpy as np

x = np.array([-1, 1, 2, 3, 4, 5, 7, -1, 4, 4, 4, 5, 6, 7, 7, 8, -1, 0, 2,
               3, 5, -1])
arrays = np.split(x, np.where(x == -1)[0])
arrays = [item[1:] for item in arrays if len(item) > 1]

This yields:

[array([1, 2, 3, 4, 5, 7]),
 array([4, 4, 4, 5, 6, 7, 7, 8]),
 array([0, 2, 3, 5])]

What's going on is that where will yield an array (actually a tuple of arrays, therefore the where(blah)[0]) of the indicies where the given expression is true. We can then pass these indicies to split to get a sequence of arrays.

However, the result will contain the -1's and an empty array at the start, if the sequence starts with -1. Therefore, we need to filter these out.

If you're not already using numpy, though, your (or @DSM's) itertools solution is probably a better choice.

Sign up to request clarification or add additional context in comments.

3 Comments

Interesting. If I wanted to then pad all arrays after the first, it would look kinda like this loop : for i in arrays : new = len(array[i-1]) + array[i] ?
@user2221667 - By "pad", do you mean make all the arrays the same length? (e.g. [0, 2, 3, 5] --> [0, 2, 3, 5, -1, -1, -1, -1]) If so, have a look at numpy.pad. However, at that point, it might make more sense to just use a 2D array of the right size and set the first N elements of each row to the values in your array. For example: x = np.zeros((num_arrays, max_len)); for i, item in enumerate(arrays): x[i, :len(item)] = item. (That's a bit unreadable in comment form, sorry.)
Thanks, Ok ONE more question. Lets say I make these lists, do some work to them, how do I get them back together at the end so that each array is between a -1? Should I use np.concatenate?
4

If you only need the groups themselves and don't care about the indices of the groups (you could always reconstruct them, after all), I'd use itertools.groupby:

>>> from itertools import groupby
>>> seq = [-1, 1, 2, 3, 4, 5, 7, -1, 4, 4, 4, 5, 6, 7, 7, 8, -1, 0, 2, 3, 5, -1]
>>> groups = [list(g) for k,g in groupby(seq, lambda x: x != -1) if k]
>>> groups
[[1, 2, 3, 4, 5, 7], [4, 4, 4, 5, 6, 7, 7, 8], [0, 2, 3, 5]]

I missed the numpy tags, though: if you're working with numpy arrays, using np.split/np.where is a better choice.

Comments

0

I would do it something like this, which is a little different from the path you started down:

input_list = [-1,1,2,3,4,5,7,-1,4,4,4,5,6,7,7,8,-1,0,2,3,5,-1]

list_index = -1
new_lists = []
for i in input_list:
    if i == -1:
        list_index += 1
        new_lists.append([])
        continue
    else:
        print list_index
        print new_lists
        new_lists[list_index].append(i)

Comments

0

I think when you build your list, you can directly add the values to a string. So rather than starting with a list like xx = [], you can start with xx = '', and then do an update like xx = xx + ' ' + str (val). The result will be a string rather than a list. Then, you can just use the split() method on the strihg.

In [48]: xx
Out[48]: '-1 1 2 3 4 5 7 -1 4 4 4 5 6 7 7 8 -1 0 2 3 5 -1'

In [49]: xx.split('-1')
Out[49]: ['', ' 1 2 3 4 5 7 ', ' 4 4 4 5 6 7 7 8 ', ' 0 2 3 5 ', '']

In [50]: xx.split('-1')[1:-1]
Out[50]: [' 1 2 3 4 5 7 ', ' 4 4 4 5 6 7 7 8 ', ' 0 2 3 5 ']

Am sure you can take it from here ...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.