2

I have this very simple problem, but somehow I have not found a solution for it yet:

I have two curves, A1 = [1,2,3] A2 = [4,5,6]

I want to fit those curves to another curve B1 = [4,5,3] with Linear Regression so B1 = aA1 + bA2

This can easily be done with sklearn LinearRegression - but sklearn does not give you the standard deviation on your fitting parameters.

I tried using statsmodels... but somehow i cant get the format right

import numpy as np

import statsmodels.api as sm

a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([4, 5, 3])

ols = sm.OLS(a, b)

Error : ValueError: endog and exog matrices are different sizes

1
  • Do you want the STD of the predictions or of the coefficients? Commented Aug 13, 2021 at 14:23

2 Answers 2

2

If your formula is B1 = aA1 + bA2, then the array b is your endogenous and the array a is your exogenous. You need to transpose your exogenous:

ols = sm.OLS(b, a.T)
res = ols.fit()
res.summary()

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.250
Model:                            OLS   Adj. R-squared:                 -0.500
Method:                 Least Squares   F-statistic:                    0.3333
Date:                Sat, 14 Aug 2021   Prob (F-statistic):              0.667
Time:                        05:48:08   Log-Likelihood:                -3.2171
No. Observations:                   3   AIC:                             10.43
Df Residuals:                       1   BIC:                             8.631
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -2.1667      1.462     -1.481      0.378     -20.749      16.416
x2             1.6667      0.624      2.673      0.228      -6.257       9.590
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   3.000
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.531
Skew:                           0.707   Prob(JB):                        0.767
Kurtosis:                       1.500   Cond. No.                         12.3
==============================================================================

From sklearn:

from sklearn.linear_model import LinearRegression
reg = LinearRegression(fit_intercept=False).fit(a.T,b)
reg.coef_
array([-2.16666667,  1.66666667])
Sign up to request clarification or add additional context in comments.

3 Comments

When I follow your method and write out res, I just get: <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7f38c12e5240>
btw is there a way to enforce positive fitting parameters with statsmodels?
1

Assume you have a data-matrix X where each row is an observation and a column vector y, the OLS solution is given by

beta_hat = (X'X)^(-1)*X'*y

where X' is the transpose of X and X^(-1) the inverse.

The variance of the coefficients are given as

Var(beta)= s^2*(X'*X)

with s^2 being the estimated variance of your data.

With that you can use any linalg tool, e.g numpy to easy do the calculations

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.