7

i am having the following information(dataframe) in python

product baskets scaling_factor
12345   475     95.5
12345   108     57.7
12345   2       1.4
12345   38      21.9
12345   320     88.8

and I want to run the following non-linear regression and estimate the parameters.

a ,b and c

Equation that i want to fit:

scaling_factor = a - (b*np.exp(c*baskets))

In sas we usually run the following model:(uses gauss newton method )

proc nlin data=scaling_factors;
 parms a=100 b=100 c=-0.09;
 model scaling_factor = a - (b * (exp(c*baskets)));
 output out=scaling_equation_parms 
parms=a b c;

is there a similar way to estimate the parameters in Python using non linear regression, how can i see the plot in python.

2
  • 3
    I suggest you to check non linear regression in scipy scipy-cookbook.readthedocs.io/items/robust_regression.html Commented Oct 6, 2016 at 11:59
  • 1
    yeah, was looking at that only, but could not figure out the way , how they have used t_train in y_train Commented Oct 6, 2016 at 12:13

2 Answers 2

6

For problems like these I always use scipy.optimize.minimize with my own least squares function. The optimization algorithms don't handle large differences between the various inputs well, so it is a good idea to scale the parameters in your function so that the parameters exposed to scipy are all on the order of 1 as I've done below.

import numpy as np

baskets = np.array([475, 108, 2, 38, 320])
scaling_factor = np.array([95.5, 57.7, 1.4, 21.9, 88.8])

def lsq(arg):
    a = arg[0]*100
    b = arg[1]*100
    c = arg[2]*0.1
    now = a - (b*np.exp(c * baskets)) - scaling_factor
    return np.sum(now**2)

guesses = [1, 1, -0.9]
res = scipy.optimize.minimize(lsq, guesses)

print(res.message)
# 'Optimization terminated successfully.'

print(res.x)
# [ 0.97336709  0.98685365 -0.07998282]

print([lsq(guesses), lsq(res.x)])
# [7761.0093358076601, 13.055053196410928]

Of course, as with all minimization problems it is important to use good initial guesses since all of the algorithms can get trapped in a local minimum. The optimization method can be changed by using the method keyword; some of the possibilities are

  • ‘Nelder-Mead’
  • ‘Powell’
  • ‘CG’
  • ‘BFGS’
  • ‘Newton-CG’

The default is BFGS according to the documentation.

Sign up to request clarification or add additional context in comments.

3 Comments

many thanks for this, out of curiosity which method is used by default ? can I use guesses=[100,100,-0.09] as mentioned in the proc nlin in sas above. Also how is this different from "scipy.optimize import least_squares"
@Mukul I didn't realize those were your guesses, I'm not familiar with SAS. I've updated the answer to use those values. Note that I've scaled the parameters in the least squares function in order to keep them all close to 1.
ok . this seemed to have worked. but getting the following message "Desired error not necessarily achieved due to precision loss"
5

Agreeing with Chris Mueller, I'd also use scipy but scipy.optimize.curve_fit. The code looks like:

###the top two lines are required on my linux machine
import matplotlib
matplotlib.use('Qt4Agg')
import matplotlib.pyplot as plt
from matplotlib.pyplot import cm
import numpy as np
from scipy.optimize import curve_fit #we could import more, but this is what we need
###defining your fitfunction
def func(x, a, b, c):
    return a - b* np.exp(c * x) 
###OP's data
baskets = np.array([475, 108, 2, 38, 320])
scaling_factor = np.array([95.5, 57.7, 1.4, 21.9, 88.8])
###let us guess some start values
initialGuess=[100, 100,-.01]
guessedFactors=[func(x,*initialGuess ) for x in baskets]
###making the actual fit
popt,pcov = curve_fit(func, baskets, scaling_factor,initialGuess)
#one may want to
print popt
print pcov
###preparing data for showing the fit
basketCont=np.linspace(min(baskets),max(baskets),50)
fittedData=[func(x, *popt) for x in basketCont]
###preparing the figure
fig1 = plt.figure(1)
ax=fig1.add_subplot(1,1,1)
###the three sets of data to plot
ax.plot(baskets,scaling_factor,linestyle='',marker='o', color='r',label="data")
ax.plot(baskets,guessedFactors,linestyle='',marker='^', color='b',label="initial guess")
ax.plot(basketCont,fittedData,linestyle='-', color='#900000',label="fit with ({0:0.2g},{1:0.2g},{2:0.2g})".format(*popt))
###beautification
ax.legend(loc=0, title="graphs", fontsize=12)
ax.set_ylabel("factor")
ax.set_xlabel("baskets")
ax.grid()
ax.set_title("$\mathrm{curve}_\mathrm{fit}$")
###putting the covariance matrix nicely
tab= [['{:.2g}'.format(j) for j in i] for i in pcov]
the_table = plt.table(cellText=tab,
                  colWidths = [0.2]*3,
                  loc='upper right', bbox=[0.483, 0.35, 0.5, 0.25] )
plt.text(250,65,'covariance:',size=12)
###putting the plot
plt.show()
###done

Eventually, giving you: enter image description here

2 Comments

woah. this perfectly matched with the output of sas. thanks so much
@Mukul You're wellcome. Note, that you can achieve similar results with several scipy function, including minimize as suggested by Chris Mueller, and leastsq, where e.g. the latter one can also give the covariance matrix if you apply the option full_output. Also note, a good guess for initial values always helps, but I guess you got that already.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.