2

I am trying to convert a generator to a numpy array. I apply a map function on a list of data and the result is a generator. I tried doing list(map()) and then creating the numpy vector but it takes a long time. I saw somewhere that I can directly use np.fromiter to create a numpy vector from my generator. However, I run into this error:

ValueError: setting an array element with a sequence.

I've found out that the error rises because my generator generates a list of lists. like: [[1,2,3], [4,5,6]] and I should use a proper structural dtype for the fromiter() function. I couldn't find a proper explanation of the way to do this. Can you help me?

Here's a full example:

import numpy as np

def foo(bar):
  return [bar] * 3 # so for 4 it returns [4,4,4], ..

a = [1,2,3,4,5,6,7]
b = map(foo,a)
c = np.fromiter(b, int) # this doesn't work.
2
  • Please post an MCVE Commented Apr 28, 2020 at 17:16
  • ok. I'll edit the post ASAP. Commented Apr 28, 2020 at 17:20

1 Answer 1

2

To use a compound dtype, the function has to return tuples, not lists

In [977]: def foo(bar): 
     ...:   return (bar,) * 3 # so for 4 it returns [4,4,4], .. 
     ...:  
     ...: a = [1,2,3,4,5,6,7] 
     ...: b = map(foo,a)                                                                               
In [978]: list(b)                                                                                      
Out[978]: [(1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 4), (5, 5, 5), (6, 6, 6), (7, 7, 7)]
In [979]: def foo(bar): 
     ...:   return (bar,) * 3 # so for 4 it returns [4,4,4], .. 
     ...:  
     ...: a = [1,2,3,4,5,6,7] 
     ...: b = map(foo,a)                                                                               
In [980]: np.fromiter(b, 'i,i,i')                                                                      
Out[980]: 
array([(1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 4), (5, 5, 5), (6, 6, 6),
       (7, 7, 7)], dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])

some timings:

In [981]: %%timeit b = map(foo,a) 
     ...: np.array(list(b)) 
     ...:  
     ...:                                                                                              
1.9 µs ± 55.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [982]: %%timeit b = map(foo,a) 
     ...: np.fromiter(b, 'i,i,i') 
     ...:  
     ...:                                                                                              
17.2 µs ± 9.72 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! So it isn't faster as I thought.
Generators are most useful when you string several together. They delay evaluation, but soon or later you have to generate all values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.