1

I have a list of lists as follows where in each list the first element have indexes and the second element has the value of each indexes (Note: each element in the list is a numpy array).

mylist = [[[1 2 4 5], [0.1 0.7 0.7 0.7]], [[0 3], [0.2 0.4]]]

So my final output should be an array like this.

[0.2, 0.1, 0.7, 0.4, 0.7, 0.7]

I know the length of the array prior. So, in the above array the length is 6.

So, I defined a numpy array as follows

import numpy as np
np.empty(6, dtype=object)

I am wondering if it is possible to fill the numpy array at each iteration simultaneouly without filling each index one by one.

I am happy to provide more details if needed.

3 Answers 3

4

This should work, if I understood the structure of mylist correctly:

>>> idcs, vals = np.hstack(mylist)
>>> vals[idcs.argsort()]
array([0.2, 0.1, 0.7, 0.4, 0.7, 0.7])

Edit: As Paul Panzer points out in the comments, the sorting operation is unnecessary. If you're not working with big data sets I doubt you will see a difference, but here is another method that should be linear time:

>>> idcs, vals = np.hstack(mylist)
>>> out = np.zeros(len(idcs))
>>> out[idcs.astype(int)] = vals
>>> out
array([0.2, 0.1, 0.7, 0.4, 0.7, 0.7])

Though I don't like it as much because of the type conversion.

Edit: Another one, without type conversion:

>>> idcs, vals = map(np.hstack, zip(*mylist))
>>> out = np.zeros(len(idcs))
>>> out[idcs] = vals
>>> out
array([0.2, 0.1, 0.7, 0.4, 0.7, 0.7])
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you. I was looking for something like this. :)
This is a really clever answer and a great example of np.hstack + argsort()
@PeptideWitch it is also an O(n log n) solution to an O(n) problem.
@PaulPanzer Fair enough—I've added another solution that should be O(n)
@Seb you can avoid the (back) type conversion you don't like by not using hstack in the first place.
|
3

Here are some timings for three O(n) solutions. @Seb's hstack, a concatenate based solution and a simple loop:

enter image description here

Code to produce the graph:

from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np

B = BenchmarkBuilder()

@B.add_function()
def loop(L,n):
    out = np.empty(n)
    for idx,data in L:
        out[idx] = data
    return out

@B.add_function()
def concat(L,n):
    idx,data = map(np.concatenate,zip(*L))
    out = np.empty_like(data)
    out[idx] = data
    return out

@B.add_function()
def hstack2(L,n):
    idx,data = map(np.hstack,zip(*L))
    out = np.empty_like(data)
    out[idx] = data
    return out

@B.add_function()
def hstack(L,n):
    idx,data = np.hstack(L)
    out = np.empty_like(data)
    out[idx.astype(int)] = data
    return out


@B.add_arguments('total size')
def argument_provider():
    for exp in range(2,20):
        sz = int(2**exp)
        szs = np.random.randint(1,10,sz)
        SZS = szs.cumsum()
        idx = np.split(np.random.permutation(SZS[-1]),SZS[:-1])
        data = np.arange(1,SZS[-1]+1)*0.1
        yield SZS[-1], MultiArgument([[[i,data[i]] for i in idx],SZS[-1]])

r = B.run()
r.plot()

import pylab
pylab.savefig('unchop.png')

2 Comments

I have to admit, I feel a bit silly because the simple loop didn’t even occur to me
@Seb Don't beat yourself up over it. We keep getting told don't do loops in numpy. It takes some experience to get a feeling for where the exceptions may lie.
1

You can use the length of the array as your guide to pulling out the correct index:

new_list = []
for i in range(6):
    for x in mylist:
        if i in x[0]:
            new_list.append(x[1][i])

new_array = np.asaray(new_list)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.