3

Let A1 and A2 be numpy arrays of the same shape, say ((d1,d2)). I want to build ((d1,d1)) array from it such that its [i,j]th entry is defined by applying a function to the tuple A1[i],A2[j]. I use np.fromfunction in the form

f=lambda i,j: np.inner(A1[i],A2[j])
A=np.fromfunction(f, shape=(d1, d1)) 

(as suggested in Fastest way to initialize numpy array with values given by function) .

However I get the error ''IndexError: arrays used as indices must be of integer (or boolean) type''. This is strange because changing the lambda function to for example

 f=lambda i,j: i*j

works fine! It seems calling another function in the lambda function leads to trouble with

np.fromfunction

(np.inner is just an example and I'd like to be able to replace it by other such functions).

1 Answer 1

7

To debug the situation, make f a proper function and add a print statement to see the value of i and j:

import numpy as np
np.random.seed(2015)
d1, d2 = 5, 3
A1 = np.random.random((d1,d2))
A2 = np.random.random((d1,d2))
def f(i, j):
    print(i, j)
    return np.inner(A1[i],A2[j])
A = np.fromfunction(f, shape=(d1, d1)) 

You'll see (i, j) equals:

(array([[ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.]]), array([[ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.],
       [ 0.,  1.,  2.,  3.,  4.]]))

Aha. The problem is that these arrays are float-valued. As the error message says, indices have to be of integer or boolean type.

Perusing the docstring for np.fromfunction reveals it has a third parameter, dtype, which controls the data type of coordinate arrays:

Parameters
dtype : data-type, optional
    Data-type of the coordinate arrays passed to `function`.
    By default, `dtype` is float.

Therefore the solution is to add dtype=int to the call to np.fromfunction:

A = np.fromfunction(f, shape=(d1, d1), dtype=int) 
Sign up to request clarification or add additional context in comments.

5 Comments

Sorry, I'm still confused that with using 'i*j' as function,one gets arrays of integers '(i,j)' whereas with np.inner one gets what you wrote? Shouldn't 'np.fromfunction' just apply 'f' to all tuples '(i,j)' that are combinations of indices (between 0 and d1)?
I don't think np.fromfunction is the right function for this purpose. You see what the indices i and j look like. They are superfluous -- no i,j or f is necessary -- since the entire computation can be done with np.inner(A1,A2).
Regarding: "Shouldn't 'np.fromfunction' just apply 'f' to all tuples '(i,j)'" No, this is not what np.fromfunction does. There is no function in NumPy to do this because calling a Python function f for each tuple would be terribly slow for large arrays. To leverage NumPy effectively, you generally want to express the computation with the fewest number of function calls necessary, and pass the biggest array possible to those functions. This off-loads the most work to NumPy's fast underlying C/Fortran functions and relies the least on slower Python code.
Don't try to express the computation element-by-element (as you would in C). Instead try to find the NumPy function which achieves the same result while operating on whole arrays.
Yes, thanks! I wasn't aware of this. I just benchmarked np.inner(A1,A2) versus a double loop over the indices with A[i,j]=np.inner(A1[i],A2[j]) and your solution is MUCH, MUCH faster!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.