0

I'm looking for an efficient function to automatically produce betas for every possible multiple regression model given a dependent variable and set of predictors as a DataFrame in python.

For example, given this set of data:

enter image description here

https://i.sstatic.net/YuPuv.jpg
The dependent variable is 'Cases per Capita' and the columns following are the predictor variables.

In a simpler example:


  Student   Grade    Hours Slept   Hours Studied   ...  
 --------- -------- ------------- --------------- ----- 
  A             90             9               1   ...  
  B             85             7               2   ...  
  C            100             4               5   ...  
  ...          ...           ...             ...   ...  

where the beta matrix output would look as such:


  Regression   Hours Slept   Hours Studied  
 ------------ ------------- --------------- 
           1   #             N/A            
           2   N/A           #              
           3   #             #              

The table size would be [2^n - 1] where n is the number of variables, so in the case with 5 predictors and 1 dependent, there would be 31 regressions, each with a different possible combination of beta calculations.

The process is described in greater detail here and an actual solution that is written in R is posted here.

1 Answer 1

1

I am not aware of any package that already does this. But you can create all those combinations (2^n-1), where n is the number of columns in X (independent variables), and fit a linear regression model for each combination and then get coefficients/betas for each model.

Here is how I would do it, hope this helps

from sklearn import datasets, linear_model
import numpy as np
from itertools import combinations

#test dataset
X, y = datasets.load_boston(return_X_y=True)

X = X[:,:3] # Orginal X has 13 columns, only taking n=3 instead of 13 columns

#create all 2^n-1 (here 7 because n=3) combinations of columns, where n is the number of features/indepdent variables

all_combs = [] 
for i in range(X.shape[1]):
    all_combs.extend(combinations(range(X.shape[1]),i+1))

# print 2^n-1 combinations
print('2^n-1 combinations are:')
print(all_combs) 

 ## Create a betas/coefficients as zero matrix with rows (2^n-1) and columns equal to X
betas = np.zeros([len(all_combs), X.shape[1]])+np.NaN

## Fit a model for each combination of columns and add the coefficients into betas matrix
lr = linear_model.LinearRegression()
for regression_no, comb in enumerate(all_combs):
    lr.fit(X[:,comb], y)
    betas[regression_no, comb] = lr.coef_

## Print Coefficients of each model
print('Regression No'.center(15)+" ".join(['column {}'.format(i).center(10) for i in range(X.shape[1])]))  
print('_'*50)
for index, beta in enumerate(betas):
    print('{}'.format(index + 1).center(15), " ".join(['{:.4f}'.format(beta[i]).center(10) for i in range(X.shape[1])]))

results in

2^n-1 combinations are:
[(0,), (1,), (2,), (0, 1), (0, 2), (1, 2), (0, 1, 2)]


    Regression No  column 0   column 1   column 2 
__________________________________________________
       1         -0.4152      nan        nan    
       2           nan       0.1421      nan    
       3           nan        nan      -0.6485  
       4         -0.3521     0.1161      nan    
       5         -0.2455      nan      -0.5234  
       6           nan       0.0564    -0.5462  
       7         -0.2486     0.0585    -0.4156  
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.