3

So, I am not really a programmer, but I need to do figure out a relationship on an equation of two variables, I have been googling extensively, but I can't figure out how to input my data into sklearn linear_model.

I have a dataframe defined thus

I = [-2, 0, 5, 10, 15, 20, 25, 30]
d = {27.11 : [9.01,8.555,7.56,6.77,6.14,5.63,5.17,4.74],
     28.91 : [8.89,8.43,7.46,6.69,6.07,5.56,5.12,4.68],
     30.72 : [8.76,8.32,7.36,6.60,6.00,5.50,5.06,4.69],
     32.52 : [8.64,8.20,7.26,6.52,5.93,5.44,5.00,4.58],
     34.33 : [8.52,8.08,7.16,6.44,5.86,5.38,4.95,4.52],
     36.11 : [8.39,7.97,7.07,6.35,5.79,5.31,4.86,4.46]}
oxy = pd.DataFrame(index = I, data = d) # temp, salinity to oxygenation ml/L

With the indices representing temperature, and the column names representing salinity, and I need to come up with a way to predict an oxygenation (the values in the columns) from temperature and salinity.

I think my issue is mostly syntax related,

I have tried fitting my data by

X = [list(oxy.columns.values),list(oxy.index.values)]
regr = linear_model.LinearRegression()
regr.fit(X,oxy)

along with lots variants trying to get the values at index,column in the datatable to be associated with each X. I am really just not figuring out how to do this.

I found lots of guides on two variables, but they all had flat datasets, and I don't know how to flatten this without lots and lots of typing.

So my question is, either, is there a way to do a regression on two varibles with my independent varibles being my index and column values on a pandas datatable, and or, is there a quick and efficient way to flatten this datatable into a 48 by 3 datatable, so that one of the many guides I've found will actually help me?

Thank you in advanced.

1 Answer 1

3

You can use stack to reshape the data, and then rename the columns:

oxy2 = oxy.stack().reset_index()
oxy2.columns = ['salinity','temperature','oxygenation']

Output is a 48 by 3 dataframe. Showing only first 5 rows:

#print(oxy2.head())
    salinity  temperature  oxygenation
0         -2        27.11        9.010
1         -2        28.91        8.890
2         -2        30.72        8.760
3         -2        32.52        8.640
4         -2        34.33        8.520

Then you can run the regression using the following code:

regr = linear_model.LinearRegression()
regr.fit(oxy2[['salinity','temperature']], oxy2['oxygenation'])
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much. I've been messing around with pd.melt thinking it would help but this is much better!
I suggest visually inspecting scatterplots of temperature and salinity versus oxygenation to determine if there is any obvious data transform such as log or exp that might help with the linear regression - this is fast and easy to do.
I see from a 3D scatterplot that the data does not lie on a flat plane. When I add an interaction effect of "salinity * temperature" to the regression, the fit is improved.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.