Python/Numpy Code Optimisation

Question

I have the following two matrix algebra calculations in a large iteration. I am therefore looking to optimize the cacluclation.

1:

    F = np.matrix(np.zeros(shape=(n+1,1)))
    F[0:n] = x - np.diag(np.array(theta)[0:n].flatten())*self.W*(theta[0:n]-self.m) + theta[0:n]*theta[n]
    F[n] = np.sum(theta[0:n]) - 1; #Lagrange multiplier term

2:

    J = np.matrix(np.zeros(shape=(n+1,n+1)))

    #Now add the dF_lamba/d(theta_i) = 1 and dF_lamba/d(lambda) = 0
    J[n,n] = 0

    #The following is correct for the off diagonal elements
    J[:n,:n] = -np.diag(np.array(theta)[0:n].flatten()) * self.W * np.diag(np.array(theta)[0:n].flatten())

    #We now update for the on diagonal elements
    J[:n,:n] = (J[:n,:n] - np.diag(np.diag(J[:n,:n])) +
           np.diag(np.array(-np.multiply(np.diag(np.diag(self.W)),np.diag(np.array(theta)[0:n].flatten())) * self.W * (theta[0:n] - self.m) + theta[n]).flatten()))

    #Finally adjust for the final columns
    J[:n,n] = theta[:n]
    J[n,:n] = 1

I'm not sure which of the numpy calls are computationally expensive. Is it possible to optimise this in Python to get close to C speed, or would I have to program it in C itself?

TEST FUNCTIONS FOR 1

import numpy as np

def _nonLinEq(m, W, x, theta):
    #This outputs the nonlinear equations in theta
    #resulting from a the partial log derivative of a multivariate 
    #normal prior with covariance matrix E, means m and a multiinomial
    #likelihood with observations x.

    #F = [F_1, ... F_n, F_lambda]' ouput values where F_i = F(theta_i)
    n = len(m)
    F = define_F(n)
    F[0:n] = assign_values_to_F(x, theta, W, m, n)
    F[n] = assign_lagrange_multiplier_term(theta, n) #Lagrange multiplier term

    return F

def define_F(n):
    return np.matrix(np.zeros(shape=(n+1,1)))

def diag_theta(theta, n):
    return np.diag(np.array(theta)[0:n].flatten())

def multiply_terms(theta, W, m, n):
    return diag_theta(theta, n)*W*(theta[0:n]-m)

def assign_values_to_F(x,theta,W,m,n):
    return x - multiply_terms(theta, W, m, n) + theta[0:n]*theta[n]

def assign_lagrange_multiplier_term(theta, n):
    return np.sum(theta[0:n]) - 1 

def test_nonLinEq():
    n = 100
    temp = np.random.rand(n)
    m = np.transpose(np.matrix(temp/np.sum(temp)))
    W = np.matrix(np.diag(np.random.rand(n)))
    x = np.transpose(np.matrix(np.floor(np.random.rand(n)*10)))
    theta = np.transpose(np.matrix(np.append(np.ones(n)/n, -1)))

    for i in range(1000):
        _nonLinEq(m, W, x, theta)

Test Functions For 2

def _jacNLE(m, W, x, theta):
    #This finds the Jacobian of our non-linear equations

    #J = (J_ij) ouput values where F_ij = dF_i/d(theta_j)
    n = len(m);

    J = define_J(n)

    #The following is correct for the off diagonal elements
    diag_theta = convert_theta_to_diagonal(theta, n)
    J[:n,:n] = input_off_diagonal_J(diag_theta, W)

    #We now update for the on diagonal elements
    J[:n,:n] = remove_J_diagonal(J, n) + new_diagonal(W, theta, m, diag_theta, n)

    #Finally adjust for the final columns
    J[:n,n] = theta[:n]
    J[n,:n] = 1

    return J

def define_J(n):
    return np.matrix(np.zeros(shape=(n+1,n+1)))

def convert_theta_to_diagonal(theta, n):
    return np.diag(np.array(theta)[0:n].ravel())

def input_off_diagonal_J(diag_theta, W):
    return -diag_theta * W * diag_theta

def remove_J_diagonal(J, n):
    return J[:n,:n] - np.diag(np.diag(J[:n,:n]))

def matrix_prod(W, diag_theta):
    return -np.multiply(np.diag(np.diag(W)),diag_theta)

def new_diagonal(W, theta, m, diag_theta, n):
    return np.diag(np.array(matrix_prod(W, diag_theta) * W * (theta[0:n] - m) + theta[n]).ravel())

def test_jacNLE():
    n = 2
    temp = np.random.rand(n)
    m = np.transpose(np.matrix(temp/np.sum(temp)))
    W = np.matrix(np.diag(np.random.rand(n)))
    x = np.transpose(np.matrix(np.floor(np.random.rand(n)*10)))
    theta = np.transpose(np.matrix(np.append(np.ones(n)/n, -1)))

    for i in range(1000):
        _jacNLE(m, W, x, theta)

Have you used cProfile on your code to see which is the most expensive? — Ffisegydd
– Ffisegydd, Commented May 7, 2014 at 17:09
In cases like this, I always first isolate the bottleneck from the rest of the code. Wrap each of these up into a function that you can call independently and profile. Add these to a test script that also generates dummy values of the appropriate size and datatype for all the input variables. Your test script should call each function, passing in these dummy values. Now you are able to time and profile your code and measure any improvements. Finally, update your question with the test script. — YXD
– YXD, Commented May 7, 2014 at 17:58
I did cProfile my code and it pointed out the functions containing these two bits of code. I will do as Mr E says to try and get more to the root of the issue. — rwolst
– rwolst, Commented May 7, 2014 at 21:34
I've included my INITIAL cprofile log here dumptext.com/hNaGFjnD. As I interpret it, it is saying most of the time is spent in _nonLinEq and _jacNLE which are essentially made up of the code in 1 and 2 respectively. — rwolst
– rwolst, Commented May 7, 2014 at 21:58
@MrE, I've uploaded what I think you meant in your comment for case 1. I will do it for case 2 also if it is in fact what you meant. — rwolst
– rwolst, Commented May 7, 2014 at 22:01

koffein · Accepted Answer · 2014-05-07 23:07:17Z

1

I recently wanted to optimize numpy code myself.

At first I would try the line_profiler package. It allows you to measure the time consumption of each line. At first the usage looked a little tricky, but in the end it is a nice tool and you can detect the bottleneck for sure.

An other tools that might speed up your code is numba. It just-in-time-compiles your code, if you tell some type information.

There are specific functions in numpy I noticed to be slow.

First of all: np.sum It' s strange but it can even be slower than pythons built-in sum-function. I once tried to avoid it by either multiplying the array you want to sum up with a matrix(ones(N))* or writing a function "in" numba. I am afraid I do not remember which solution was faster.

I think I read once, that np.matrix is slower than the simple wrapper np.asmatrix.

If you fill an empty array anyway, you can use np.empty instead of np.zeros.

An other approach is compiling numpy on your machine again using some optimized BLAS, LAPACK stuff (Until I had numpy problems I had never heard of those...) It should work, but I failed.

I also heard of ne gnumpy-Package using CUDA and your GPU power, but I would have had to install some CUDA-software for this...

[EDIT:] * I predefined onesN = matrix(ones(N)) so I would not have to build it every time I just wanted to sum up an array. That is actually an other hint: Look for constants and predefine and peruse them...

edited May 7, 2014 at 23:07

answered May 7, 2014 at 22:56

koffein

1,92214 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

rwolst Over a year ago

The line_profiler package is really useful. I'm going to implement numba as well to see if I can get any more speedup.

rateldajer · Accepted Answer · 2014-05-07 18:39:56Z

0

I agree with the comments, you should do some profiling, but clearly one thing you could do is assign

np.diag(np.array(theta)[0:n].flatten())

to a local variable and use that in your calculations since you are doing it several times. If you n is large this will be a fairly expensive computation since you are flattening, copying to a new array and then turning that create a new matrix with that array as the diagonal.

answered May 7, 2014 at 18:39

rateldajer

1481 silver badge7 bronze badges

Collectives™ on Stack Overflow

Python/Numpy Code Optimisation

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related