2

Whenever I need to test a moderately complex numpy expression, say,

c = np.multiply.outer(a, b)
d = np.einsum('kjij->ijk', c)

I end up doings hacks such as, e.g., setting a and bthus

a = np.arange(9).reshape(3,3)
b = a / 10

so that I can then track what d contains.

This is ugly and not very convenient. Ideally, I would be able to do something like the following:

a = np.array(list("abcdefghi")).reshape(3,3)
b = np.array(list("ABCDEFGHI")).reshape(3,3)

c = np.add.outer(a, b)
d = np.einsum('kjij->ijk', c)

so that, e.g., d[0, 1, 2] could be seen to correspond to 'hB', which is much clearer than .7 (which is what the other assignment to a and b would give.) This cannot be done, because the ufunc add does not take characters.

In summary, once I start chaining a few transformations (an outer product, an einsum, broadcasting or slicing, etc.) I lose track and need to see for myself what my transformations are actually doing. That's when I need to run a few examples, and that's where my current method of doing so strikes me as suboptimal. Is there any standard, or better, way to do this?

9
  • 1
    Sorry, you'll explain your problem more. I don't see what's hackie. Accepted numpy SO answers do that kind of test/verification all the time. Commented May 17, 2018 at 10:53
  • @hpaulj, I meant a hack as in, an ugly solution. Commented May 17, 2018 at 11:00
  • 2
    I understand hack implied ugly, but I still don't see what's wrong. Nor what you are trying to do with strings. Commented May 17, 2018 at 11:05
  • Thanks. I have edited my question; hopefully what I am after will now be clearer. Commented May 17, 2018 at 11:20
  • 1
    docs.scipy.org/doc/numpy-1.14.0/reference/generated/… - If I have time after work, I will see if it is unclear and propose to add something. Or maybe you want to do a pull request? Commented May 17, 2018 at 11:37

1 Answer 1

2
In [454]: a = np.array(list("abcdefghi")).reshape(3,3)
     ...: b = np.array(list("ABCDEFGHI")).reshape(3,3)

np.add can't be used because add has not been defined for the string dtype:

In [455]: c = np.add.outer(a,b)
....
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')

But np.char has functions that apply Python string methods to ndarray elements (these aren't fast, just convenient):

Signature: np.char.add(x1, x2)
Docstring:
Return element-wise string concatenation for two arrays of str or unicode.

Using broadcasting I can perform your outer string concatenation:

In [457]: c = np.char.add(a[:,:,None,None], b[None,None,:,:])
In [458]: c.shape
Out[458]: (3, 3, 3, 3)
In [459]: c
Out[459]: 
array([[[['aA', 'aB', 'aC'],
         ['aD', 'aE', 'aF'],
         ['aG', 'aH', 'aI']],

        [['bA', 'bB', 'bC'],
         ['bD', 'bE', 'bF'],
         ['bG', 'bH', 'bI']],

        ....
        [['iA', 'iB', 'iC'],
         ['iD', 'iE', 'iF'],
         ['iG', 'iH', 'iI']]]], dtype='<U2')

I was skeptical that einsum could handle this array, since normally einsum is used for np.dot like sum-of-products calculations. But with this indexing, it is just selecting a diagonal and rearranging axes, so it does work:

In [460]: np.einsum('kjij->ijk', c)
Out[460]: 
array([[['aA', 'dA', 'gA'],
        ['bB', 'eB', 'hB'],
        ['cC', 'fC', 'iC']],

       [['aD', 'dD', 'gD'],
        ['bE', 'eE', 'hE'],
        ['cF', 'fF', 'iF']],

       [['aG', 'dG', 'gG'],
        ['bH', 'eH', 'hH'],
        ['cI', 'fI', 'iI']]], dtype='<U2')

The d from the numeric test case:

array([[[0. , 3. , 6. ],
        [1.1, 4.1, 7.1],
        [2.2, 5.2, 8.2]],

       [[0.3, 3.3, 6.3],
        [1.4, 4.4, 7.4],
        [2.5, 5.5, 8.5]],

       [[0.6, 3.6, 6.6],
        [1.7, 4.7, 7.7],
        [2.8, 5.8, 8.8]]])

The pattern with these numeric values is just as clear as with strings.

I like to use distinct array shapes where possible, because it makes tracking dimensions across changes easier:

In [496]: a3 = np.arange(1,13).reshape(4,3)
     ...: b3 = np.arange(1,7).reshape(2,3) / 10
In [497]: c3 = np.add.outer(a3,b3)
In [498]: d3 = np.einsum('kjij->ijk', c3)
In [499]: c3.shape
Out[499]: (4, 3, 2, 3)
In [500]: d3.shape
Out[500]: (2, 3, 4)
In [501]: d3
Out[501]: 
array([[[ 1.1,  4.1,  7.1, 10.1],
        [ 2.2,  5.2,  8.2, 11.2],
        [ 3.3,  6.3,  9.3, 12.3]],

       [[ 1.4,  4.4,  7.4, 10.4],
        [ 2.5,  5.5,  8.5, 11.5],
        [ 3.6,  6.6,  9.6, 12.6]]])

This, for example, would raise an error if I try ''kjik->ijk'.

With numeric values I can perform the multiply.outer with einsum:

In [502]: c4 = np.multiply.outer(a3,b3)
In [503]: np.allclose(c4,np.einsum('ij,kl',a3,b3))
Out[503]: True
In [504]: d4 = np.einsum('kjij->ijk', c4)
In [505]: np.allclose(d4,np.einsum('kj,ij->ijk',a3,b3))
Out[505]: True
In [506]: d4
Out[506]: 
array([[[0.1, 0.4, 0.7, 1. ],
        [0.4, 1. , 1.6, 2.2],
        [0.9, 1.8, 2.7, 3.6]],

       [[0.4, 1.6, 2.8, 4. ],
        [1. , 2.5, 4. , 5.5],
        [1.8, 3.6, 5.4, 7.2]]])

That 'kj,ij->ijk' gives me a better of idea of what is happening than the d display.

Another way to put it:

(4,3) + (2,3) => (2,3,4)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot, that is very useful. You are right that this technique won't work when you actually need to multiply arrays. It would be very neat if there was a way to run numpy operations on symbolic arrays as a way of testing behavior.
There is a symbolic math module, sympy. It can convert its symbolic expressions into numpy equivalents, but it can't run numpy code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.