2

I have been using MATLAB on a daily basis for a few years, but recently I decided to switch to Python because of its additional capabilities (e.g. Machine learning libraries, etc.).

I started by translating a simple MATLAB code into Python using primarily NumPy, and run it through Spyder environment. The results are striking: Python is more than 100 slower! Below I attached both codes. For the given CSV file (gmr) Python executed the code in 25 secs, while MATLAB in several ms. More importantly, I want to repeat the same calculations for a large number of CSV files (e.g. 200). It is quite striking that MATLAB needs around 50 secs for 200 CSV files, while Python would need more than 1 hour.

Could you please suggest some modifications? What am I doing wrong here?

I have tried several alternatives ways to formulate the code, but unfortunately none of them worked.

MATLAB:

clear variables; close all ; clc ;

tic;
damp=0.05;
T=[0.01:0.01:3]';
Spectra = cell(num_records,2);

name='rec_1_out_scaled.csv';
gmr=load(name);
dt = gmr(3,1) - gmr(2,1);
cd (main_Dir)
Sa=zeros(length(T),1);
% Newmark's Direct Integration Method
for j=1:length(T)
    Sa(j)=newmark_dim(gmr(:,2),T(j),damp,dt);
end
Spectra(i,:) = {name,Sa};

toc;


function Sa = newmark_dim(gacc,T,zeta,dt)

% newmark coefficients
beta = 1/4;
gamma = 1/2;

% natural circular frequency of SDOF system
wn = 2*pi/T;

% initialization
npun = length(gacc);
y = zeros(npun,1);
yp = zeros(npun,1);
ypp = zeros(npun,1);
ypp(1) = -gacc(1)-2*wn*zeta*yp0-wn^2*y0;

% Integration coefficients
keff = wn^2 + 1/(beta*dt^2) + gamma*2*wn*zeta/(beta*dt);
a1 = 1/(beta*dt^2)+gamma*2*wn*zeta/(beta*dt);
a2 = 1/(beta*dt)+2*wn*zeta*(gamma/beta-1);
a3 = (1/(2*beta)-1)+2*wn*zeta*dt*(gamma/(2*beta)-1);

for i=1:npun-1
    y(i+1)   = (-gacc(i+1)+a1*y(i)+a2*yp(i)+a3*ypp(i))/keff;
    ypp(i+1) = (y(i+1)-y(i)-dt*yp(i)-dt^2*ypp(i)/2)/(beta*dt^2) + ypp(i);
    yp(i+1)  = yp(i)+dt*ypp(i)+dt*gamma*(ypp(i+1)-ypp(i));
end

Sd = max(abs(y));
Sa = Sd*wn^2;

return

PYTHON:

import time
import numpy as np
import math
import pandas as pd

start = time.time()

# Initialize
name = "rec_1_out_scaled.csv"
T = np.arange(0.01,3.01,0.01)
zeta = 0.05 # 5% damping
extension = 'csv'
num_records = 200
Spectra = np.zeros((num_records,T.size,2)) # Multidimentional array

"""
Newmark's Direct Integration Method
"""
def Newmark_DIM(gmr,T,zeta,dt):
    # newmark coefficients for constant acceleration
    beta = 1/4
    gamma = 1/2

    # natural circular frequency of the SDOF system
    wn = 2*math.pi/T

    # initialization
    npun =  len(gmr)
    y = np.zeros(npun)
    yp = np.zeros(npun)
    ypp = np.zeros(npun)
    ypp[0] = - gmr[0] - 2 * wn * zeta* yp[0] - wn**2 * y[0]

    # Integration coefficients
    keff = wn**2 + 1/(beta * dt**2) + gamma * 2 * wn * zeta /(beta * dt) 
    a1 = 1/(beta * dt**2) + gamma * 2 * wn * zeta/(beta * dt)
    a2 = 1/(beta * dt) + 2 * wn * zeta * (gamma/beta - 1)
    a3 = (1/(2 * beta) - 1) + 2 * wn * zeta * dt * (gamma/(2 * beta) - 1)    

    # Solves ODE
    for i in range(npun-1):
        y[i+1]   =  (-gmr[i+1] + a1 * y[i] + a2 * yp[i] + a3 * ypp[i])/keff
        ypp[i+1] = (y[i+1] -y[i] - dt * yp[i] - dt**2 * ypp[i]/2) / (beta * dt**2) + ypp[i]
        yp[i+1]  = yp[i] + dt * ypp[i] + dt * gamma * (ypp[i+1] - ypp[i])

    Sd = np.amax(np.abs(y))
    Sa = Sd * wn**2

    return (Sa)

# Calculates Spectrum
gmr = pd.read_csv(name, header=None)
gmr = gmr.values # makes it numpy array
dt = gmr[2,0] - gmr[1,0]
Sa = np.zeros(len(T))
for j in range(len(T)):
   Sa[j] = Newmark_DIM(gmr[:,1],T[j],zeta,dt)
Spectra[0][:,0] = T
Spectra[0][:,1] = Sa

print("Finished")
end = time.time()
print(end - start)
5
  • 2
    There are many similar questions already on this site, you are using different languages, expect different performance. Your question shows no evidence that you've tried to profile your code, found and specific bottlenecks, or tried to improve on them. Please Narrow down which parts of your code are particularly slow, produce a minimal reproducible example (i.e. not dependent on importing data we don't have) and edit your question. Commented Jun 24, 2019 at 13:46
  • As the comment above also says, it would make sense to find out which function is taking the most computation time and compare that for the two frameworks. As it is, the question is too broad. Commented Jun 24, 2019 at 13:48
  • My apologies if my question or explanation is not properly formulated, but is my first post. Nevertheless, it is a very simple example and the code is well structured and commented. Meaning, that any experienced user in both languages should be able to identify the reason or at least suggest a few things. We are talking about 50secs vs 1 hour + differences. Commented Jun 24, 2019 at 13:53
  • The two linked posts offer explanations and some suggestions. Pay special attention to the suggestions of using Numba. Commented Jun 24, 2019 at 15:51
  • Thank you very much guys! Numba saved the day. Actually, the Python code now is way faster than Matlab Commented Jun 24, 2019 at 20:13

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.