How to run non-linear regression in python

Question

i am having the following information(dataframe) in python

product baskets scaling_factor
12345   475     95.5
12345   108     57.7
12345   2       1.4
12345   38      21.9
12345   320     88.8

and I want to run the following non-linear regression and estimate the parameters.

a ,b and c

Equation that i want to fit:

scaling_factor = a - (b*np.exp(c*baskets))

In sas we usually run the following model:(uses gauss newton method )

proc nlin data=scaling_factors;
 parms a=100 b=100 c=-0.09;
 model scaling_factor = a - (b * (exp(c*baskets)));
 output out=scaling_equation_parms 
parms=a b c;

is there a similar way to estimate the parameters in Python using non linear regression, how can i see the plot in python.

I suggest you to check non linear regression in scipy scipy-cookbook.readthedocs.io/items/robust_regression.html — Alex L
– Alex L, Commented Oct 6, 2016 at 11:59
yeah, was looking at that only, but could not figure out the way , how they have used t_train in y_train — Mukul
– Mukul, Commented Oct 6, 2016 at 12:13

Chris Mueller · Accepted Answer · 2016-10-06 12:47:45Z

6

For problems like these I always use scipy.optimize.minimize with my own least squares function. The optimization algorithms don't handle large differences between the various inputs well, so it is a good idea to scale the parameters in your function so that the parameters exposed to scipy are all on the order of 1 as I've done below.

import numpy as np

baskets = np.array([475, 108, 2, 38, 320])
scaling_factor = np.array([95.5, 57.7, 1.4, 21.9, 88.8])

def lsq(arg):
    a = arg[0]*100
    b = arg[1]*100
    c = arg[2]*0.1
    now = a - (b*np.exp(c * baskets)) - scaling_factor
    return np.sum(now**2)

guesses = [1, 1, -0.9]
res = scipy.optimize.minimize(lsq, guesses)

print(res.message)
# 'Optimization terminated successfully.'

print(res.x)
# [ 0.97336709  0.98685365 -0.07998282]

print([lsq(guesses), lsq(res.x)])
# [7761.0093358076601, 13.055053196410928]

Of course, as with all minimization problems it is important to use good initial guesses since all of the algorithms can get trapped in a local minimum. The optimization method can be changed by using the method keyword; some of the possibilities are

‘Nelder-Mead’
‘Powell’
‘CG’
‘BFGS’
‘Newton-CG’

The default is BFGS according to the documentation.

edited Oct 6, 2016 at 12:47

answered Oct 6, 2016 at 12:13

Chris Mueller

6,7305 gold badges31 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mukul Over a year ago

many thanks for this, out of curiosity which method is used by default ? can I use guesses=[100,100,-0.09] as mentioned in the proc nlin in sas above. Also how is this different from "scipy.optimize import least_squares"

Chris Mueller Over a year ago

@Mukul I didn't realize those were your guesses, I'm not familiar with SAS. I've updated the answer to use those values. Note that I've scaled the parameters in the least squares function in order to keep them all close to 1.

Mukul Over a year ago

ok . this seemed to have worked. but getting the following message "Desired error not necessarily achieved due to precision loss"

mikuszefski · Accepted Answer · 2016-10-06 16:10:25Z

5

Agreeing with Chris Mueller, I'd also use scipy but scipy.optimize.curve_fit. The code looks like:

###the top two lines are required on my linux machine
import matplotlib
matplotlib.use('Qt4Agg')
import matplotlib.pyplot as plt
from matplotlib.pyplot import cm
import numpy as np
from scipy.optimize import curve_fit #we could import more, but this is what we need
###defining your fitfunction
def func(x, a, b, c):
    return a - b* np.exp(c * x) 
###OP's data
baskets = np.array([475, 108, 2, 38, 320])
scaling_factor = np.array([95.5, 57.7, 1.4, 21.9, 88.8])
###let us guess some start values
initialGuess=[100, 100,-.01]
guessedFactors=[func(x,*initialGuess ) for x in baskets]
###making the actual fit
popt,pcov = curve_fit(func, baskets, scaling_factor,initialGuess)
#one may want to
print popt
print pcov
###preparing data for showing the fit
basketCont=np.linspace(min(baskets),max(baskets),50)
fittedData=[func(x, *popt) for x in basketCont]
###preparing the figure
fig1 = plt.figure(1)
ax=fig1.add_subplot(1,1,1)
###the three sets of data to plot
ax.plot(baskets,scaling_factor,linestyle='',marker='o', color='r',label="data")
ax.plot(baskets,guessedFactors,linestyle='',marker='^', color='b',label="initial guess")
ax.plot(basketCont,fittedData,linestyle='-', color='#900000',label="fit with ({0:0.2g},{1:0.2g},{2:0.2g})".format(*popt))
###beautification
ax.legend(loc=0, title="graphs", fontsize=12)
ax.set_ylabel("factor")
ax.set_xlabel("baskets")
ax.grid()
ax.set_title("$\mathrm{curve}_\mathrm{fit}$")
###putting the covariance matrix nicely
tab= [['{:.2g}'.format(j) for j in i] for i in pcov]
the_table = plt.table(cellText=tab,
                  colWidths = [0.2]*3,
                  loc='upper right', bbox=[0.483, 0.35, 0.5, 0.25] )
plt.text(250,65,'covariance:',size=12)
###putting the plot
plt.show()
###done

Eventually, giving you:

edited Oct 6, 2016 at 16:10

answered Oct 6, 2016 at 12:59

mikuszefski

4,0731 gold badge29 silver badges40 bronze badges

2 Comments

Mukul Over a year ago

woah. this perfectly matched with the output of sas. thanks so much

mikuszefski Over a year ago

@Mukul You're wellcome. Note, that you can achieve similar results with several scipy function, including minimize as suggested by Chris Mueller, and leastsq, where e.g. the latter one can also give the covariance matrix if you apply the option full_output. Also note, a good guess for initial values always helps, but I guess you got that already.

Collectives™ on Stack Overflow

How to run non-linear regression in python

2 Answers 2

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related