2

I have the following two matrix algebra calculations in a large iteration. I am therefore looking to optimize the cacluclation.

1:

    F = np.matrix(np.zeros(shape=(n+1,1)))
    F[0:n] = x - np.diag(np.array(theta)[0:n].flatten())*self.W*(theta[0:n]-self.m) + theta[0:n]*theta[n]
    F[n] = np.sum(theta[0:n]) - 1; #Lagrange multiplier term

2:

    J = np.matrix(np.zeros(shape=(n+1,n+1)))

    #Now add the dF_lamba/d(theta_i) = 1 and dF_lamba/d(lambda) = 0
    J[n,n] = 0

    #The following is correct for the off diagonal elements
    J[:n,:n] = -np.diag(np.array(theta)[0:n].flatten()) * self.W * np.diag(np.array(theta)[0:n].flatten())

    #We now update for the on diagonal elements
    J[:n,:n] = (J[:n,:n] - np.diag(np.diag(J[:n,:n])) +
           np.diag(np.array(-np.multiply(np.diag(np.diag(self.W)),np.diag(np.array(theta)[0:n].flatten())) * self.W * (theta[0:n] - self.m) + theta[n]).flatten()))

    #Finally adjust for the final columns
    J[:n,n] = theta[:n]
    J[n,:n] = 1

I'm not sure which of the numpy calls are computationally expensive. Is it possible to optimise this in Python to get close to C speed, or would I have to program it in C itself?

TEST FUNCTIONS FOR 1

import numpy as np

def _nonLinEq(m, W, x, theta):
    #This outputs the nonlinear equations in theta
    #resulting from a the partial log derivative of a multivariate 
    #normal prior with covariance matrix E, means m and a multiinomial
    #likelihood with observations x.

    #F = [F_1, ... F_n, F_lambda]' ouput values where F_i = F(theta_i)
    n = len(m)
    F = define_F(n)
    F[0:n] = assign_values_to_F(x, theta, W, m, n)
    F[n] = assign_lagrange_multiplier_term(theta, n) #Lagrange multiplier term

    return F

def define_F(n):
    return np.matrix(np.zeros(shape=(n+1,1)))

def diag_theta(theta, n):
    return np.diag(np.array(theta)[0:n].flatten())

def multiply_terms(theta, W, m, n):
    return diag_theta(theta, n)*W*(theta[0:n]-m)

def assign_values_to_F(x,theta,W,m,n):
    return x - multiply_terms(theta, W, m, n) + theta[0:n]*theta[n]

def assign_lagrange_multiplier_term(theta, n):
    return np.sum(theta[0:n]) - 1 

def test_nonLinEq():
    n = 100
    temp = np.random.rand(n)
    m = np.transpose(np.matrix(temp/np.sum(temp)))
    W = np.matrix(np.diag(np.random.rand(n)))
    x = np.transpose(np.matrix(np.floor(np.random.rand(n)*10)))
    theta = np.transpose(np.matrix(np.append(np.ones(n)/n, -1)))

    for i in range(1000):
        _nonLinEq(m, W, x, theta)

Test Functions For 2

def _jacNLE(m, W, x, theta):
    #This finds the Jacobian of our non-linear equations

    #J = (J_ij) ouput values where F_ij = dF_i/d(theta_j)
    n = len(m);

    J = define_J(n)

    #The following is correct for the off diagonal elements
    diag_theta = convert_theta_to_diagonal(theta, n)
    J[:n,:n] = input_off_diagonal_J(diag_theta, W)

    #We now update for the on diagonal elements
    J[:n,:n] = remove_J_diagonal(J, n) + new_diagonal(W, theta, m, diag_theta, n)

    #Finally adjust for the final columns
    J[:n,n] = theta[:n]
    J[n,:n] = 1

    return J

def define_J(n):
    return np.matrix(np.zeros(shape=(n+1,n+1)))

def convert_theta_to_diagonal(theta, n):
    return np.diag(np.array(theta)[0:n].ravel())

def input_off_diagonal_J(diag_theta, W):
    return -diag_theta * W * diag_theta

def remove_J_diagonal(J, n):
    return J[:n,:n] - np.diag(np.diag(J[:n,:n]))

def matrix_prod(W, diag_theta):
    return -np.multiply(np.diag(np.diag(W)),diag_theta)

def new_diagonal(W, theta, m, diag_theta, n):
    return np.diag(np.array(matrix_prod(W, diag_theta) * W * (theta[0:n] - m) + theta[n]).ravel())

def test_jacNLE():
    n = 2
    temp = np.random.rand(n)
    m = np.transpose(np.matrix(temp/np.sum(temp)))
    W = np.matrix(np.diag(np.random.rand(n)))
    x = np.transpose(np.matrix(np.floor(np.random.rand(n)*10)))
    theta = np.transpose(np.matrix(np.append(np.ones(n)/n, -1)))

    for i in range(1000):
        _jacNLE(m, W, x, theta)
8
  • 3
    Have you used cProfile on your code to see which is the most expensive? Commented May 7, 2014 at 17:09
  • 1
    In cases like this, I always first isolate the bottleneck from the rest of the code. Wrap each of these up into a function that you can call independently and profile. Add these to a test script that also generates dummy values of the appropriate size and datatype for all the input variables. Your test script should call each function, passing in these dummy values. Now you are able to time and profile your code and measure any improvements. Finally, update your question with the test script. Commented May 7, 2014 at 17:58
  • I did cProfile my code and it pointed out the functions containing these two bits of code. I will do as Mr E says to try and get more to the root of the issue. Commented May 7, 2014 at 21:34
  • I've included my INITIAL cprofile log here dumptext.com/hNaGFjnD. As I interpret it, it is saying most of the time is spent in _nonLinEq and _jacNLE which are essentially made up of the code in 1 and 2 respectively. Commented May 7, 2014 at 21:58
  • 1
    @MrE, I've uploaded what I think you meant in your comment for case 1. I will do it for case 2 also if it is in fact what you meant. Commented May 7, 2014 at 22:01

2 Answers 2

1

I recently wanted to optimize numpy code myself.

At first I would try the line_profiler package. It allows you to measure the time consumption of each line. At first the usage looked a little tricky, but in the end it is a nice tool and you can detect the bottleneck for sure.

An other tools that might speed up your code is numba. It just-in-time-compiles your code, if you tell some type information.

There are specific functions in numpy I noticed to be slow.

First of all: np.sum It' s strange but it can even be slower than pythons built-in sum-function. I once tried to avoid it by either multiplying the array you want to sum up with a matrix(ones(N))* or writing a function "in" numba. I am afraid I do not remember which solution was faster.

I think I read once, that np.matrix is slower than the simple wrapper np.asmatrix.

If you fill an empty array anyway, you can use np.empty instead of np.zeros.

An other approach is compiling numpy on your machine again using some optimized BLAS, LAPACK stuff (Until I had numpy problems I had never heard of those...) It should work, but I failed.

I also heard of ne gnumpy-Package using CUDA and your GPU power, but I would have had to install some CUDA-software for this...

[EDIT:] * I predefined onesN = matrix(ones(N)) so I would not have to build it every time I just wanted to sum up an array. That is actually an other hint: Look for constants and predefine and peruse them...

Sign up to request clarification or add additional context in comments.

1 Comment

The line_profiler package is really useful. I'm going to implement numba as well to see if I can get any more speedup.
0

I agree with the comments, you should do some profiling, but clearly one thing you could do is assign

np.diag(np.array(theta)[0:n].flatten())

to a local variable and use that in your calculations since you are doing it several times. If you n is large this will be a fairly expensive computation since you are flattening, copying to a new array and then turning that create a new matrix with that array as the diagonal.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.