0

Can someone point me towards an algorithm or Python module that can help me with regression on the dataframe below. My problem is that first of all, it is a dataframe and I need to find the best match between two columns (one is a function of another column and some parameters, another one is my experimental data), secondly the number of regression variables is becoming rather big. Here are the details:

Common regression variables:

var1, var2

The dataframe (template):

Column1 Column2 Column3 Column4
x01 var01.3 f(x01, var1, var2, var01.3) resp01.01
x02 var01.3 f(x02, var1, var2, var01.3) resp01.02
x03 var01.3 f(x03, var1, var2, var01.3) resp01.03
x04 var01.3 f(x04, var1, var2, var01.3) resp01.04
x05 var01.3 f(x05, var1, var2, var01.3) resp01.05
..  ..      ..                          ..
x16 var01.3 f(x16, var1, var2, var01.3) resp01.16
# Next sequence
x01 var02.3 f(x01, var1, var2, var02.3) resp02.01
x02 var02.3 f(x02, var1, var2, var02.3) resp02.02
x03 var02.3 f(x03, var1, var2, var02.3) resp02.03
x04 var02.3 f(x04, var1, var2, var02.3) resp02.04
x05 var02.3 f(x05, var1, var2, var02.3) resp02.05
..  ..      ..                          ..
x16 var02.3 f(x16, var1, var2, var02.3) resp02.16
# More lines here
x01 var12.3 f(x01, var1, var2, var12.3) resp12.01
x02 var12.3 f(x02, var1, var2, var12.3) resp12.02
x03 var12.3 f(x03, var1, var2, var12.3) resp12.03
x04 var12.3 f(x04, var1, var2, var12.3) resp12.04
x05 var12.3 f(x05, var1, var2, var12.3) resp12.05
..  ..      ..                          ..
x16 var12.3 f(x16, var1, var2, var12.3) resp12.16

Desired objective:

Column3~=Column4

In essence, the dataframe is 12 repeats of 16 values of 1st column, 12 values identical for each 16 line sequence but different between each of 12 repeats and column 3 is a function of Column1, Cloumn2 and var1 and var2. Column4 is my reference data. I would like to get Column3 as close as possible to Column4 (RMSE as criteria, I guess?).

My summary of the regression variables: var1, var2 var01.3 to var12.3 (12 variables) x01 to x16 (16 variables) Total number of regression variable 2+12+16=40

Can someone give me a hint how to approach such problem? Thank you in advance!

3
  • statsmodels is the package i would go for in this case... it has elaborate documentation.. start from there.. Commented May 18, 2020 at 5:43
  • One problem though: f(x1..xi) is not a linear function but a gamma distribution based relationship. Not sure linear regression helps here. Commented May 19, 2020 at 11:51
  • Found a solution here: stackoverflow.com/questions/52838089/… Worked like a charm! Commented May 25, 2020 at 13:38

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.