4

I'm really struggling to optimize a calculus code on MATLAB.

It's a heavy calculation required to obtain the material properties of a non-linear material.

This calculation requires more than 240 millions steps. It's rather simple in itself as it consist in a huge number of sum. The only problem is that the numbers are stored in various arrays and lists which is a bit confusing.

Here is the original code :

Tensor=zeros(3,3,3,3);
for m=1:3
    for n=1:3
        for o=1:3
            for p=1:3
                for x=1:16 
                    for y=1:16
                        for z=1:16 
                            for i=1:3
                                for j=1:3
                                    for k=1:3
                                        for l=1:3
                                            for r=1:3
                                                for s=1:3
                                                    sum=sum+(1/(8*(pi()^2))*P{x,y,z}(i,m)*P{x,y,z}(j,n)*P{x,y,z}(k,o)*P{x,y,z}(l,p)*(TensorC(i,j,k,l)-TensorC0(i,j,r,s))*TensorA(r,s,k,l)*sin(omega(x))*p_omega(x)*p_phi(y)*p_beta(z);
                                                end
                                            end
                                        end                                                    
                                    end
                                end
                            end
                        end
                    end
                end
                Tensor(m,n,o,p)=sum;
            end
        end
    end
end

P{x,y,z}(i,m) is a change of basis formula (same for the others) : i and m determine the type of formula and the result is calculated with the x, y and z variables.

All the other numbers in the summation are real numbers picked up in arrays and tensors.

I tried to extract as much variables from the last for loop in order to calculate them as soon as possible an reduce the number of operations:

Tensor=zeros(3,3,3,3);
CO1=1/(8*(pi()^2));
for m=1:3
    for n=1:3
        for o=1:3
            for p=1:3
            sum=C0_tensor(m,n,o,p);
                for x=1:16
                    CO7=sin(omega(x));
                    CO8=p_omega(x);
                    for y=1:16
                        CO9=p_phi(y);
                        for z=1:16
                            CO10=p_beta(z);
                            for i=1:3
                                CO2=P{x,y,z}(i,m);
                                for j=1:3
                                    CO3=P{x,y,z}(j,n);
                                    for k=1:3
                                        CO4=P{x,y,z}(k,o);
                                        for l=1:3
                                            CO5=P{x,y,z}(l,p);
                                            CO6=TensorC(i,j,k,l);
                                            for r=1:3
                                                for s=1:3
                                                    CO11=TensorC0(i,j,r,s);
                                                    CO12=TensorA(r,s,k,l);
                                                    sum=sum+CO1*CO2*CO3*CO4*CO5*(CO6-CO11)*CO12*CO7*CO8*CO9*CO10;
                                                end
                                            end
                                        end                                                    
                                    end
                                end
                            end
                        end
                    end
                end
                Tensor(m,n,o,p)=sum;
            end
        end
    end
end

Still, the calculation is much too long.

I don't see any way of parallelizing or vectorizing the calculation ...

The operation of retrieving one particular value from one array or one matrix seem to be very slow...

Do you think I should build a huge tensor containing all the values instead of using multiples one?

4
  • 1
    If really hard to avoid for loops, you should start to consider using other programming languages which favors looping, like C++ Commented Dec 12, 2013 at 5:55
  • 1
    8o ...good gracious... and good luck. Do you already have a large dependency on matlab code? Commented Dec 12, 2013 at 6:58
  • Quantify much too long. Commented Dec 12, 2013 at 9:23
  • On the scientific theory side, there is no way of avaiding thoses loops ... But I think I can find a way to cut half of the calculation thanks to the symmetry of some matrix. Now the loop runs in 9min instead of 2h (i've optimized the arrays and tensors structure). If I want to go faster, I'll have to switch in C++. We'll see in a few days if it's necessary. Commented Dec 13, 2013 at 3:57

1 Answer 1

1

You shouldn't use sum as a variable name as you're overwriting the useful function by the same name.

Taking just the inner loop here, you're calculating a single value for each r and s which is added to your output value:

  for r=1:3
         for s=1:3
              CO11=TensorC0(i,j,r,s);
              CO12=TensorA(r,s,k,l);
              sum=sum+CO1*CO2*CO3*CO4*CO5*(CO6-CO11)*CO12*CO7*CO8*CO9*CO10;
         end
  end

However, you could at once get C011/C012 as 3 x 3 matrices, do one sum and then add that to your output: (changed your sum to out here, note the .* rather than * in the appropriate place):

C011 = squeeze(TensorCO(i,j,:,:));
C012 = squeeze(TensorCO(:,:,k,l));
s = CO1*CO2*CO3*CO4*CO5*(CO6-CO11).*CO12*CO7*CO8*CO9*CO10;
out = out + sum(s(:));

Also, when you do this:

CO7=sin(omega(x));
CO8=p_omega(x);

(which is just then C07*C08 in the later equation) - you don't need to recalculate sin(omega(x)) for every n,m,p., so that can be taken out of the loop entirely.

Pre calculate sin and multiply by p_omega (outside loop):

omega78 = sin(omega).*p_omega;

Then just retrieve the value C78 = omega78(x) in the x-loop and use that instead of C07*C08.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much for your help ! I applied your ideas and tweaked a bit the maths and the arrays / tensors structure. Basically, the main loop is now 10x faster. I'll keep trying to find new ideas !
I optimized the tensors and arrays structure. Now the main loop runs in 9min instead of 2h !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.