How to build machine learning regression models with Python

Home/

Blog/

Machine Learning/

7 mins read

Jan 18, 2024

Content

Understanding the input data

Linear regression model

Selecting feature

Splitting data

Applying model

Model validation

Multiple linear regression model in Python

Polynomial regression

Exciting, right? Let’s start this journey with a simple prediction model. A regression is a mathematical function that defines the relationship between a dependent variable and one or more independent variables. Regression in machine learning analyzes how independent variables or features correlate with a dependent variable or outcome. It serves as a predictive modeling approach in machine learning, where an algorithm predicts continuous outcomes. Rather than delving into theory, the focus will be on creating different models for regression.

Understanding the input data#

Before starting to build a Python regression model, one should examine the data. For instance, if an individual owns a fish farm and needs to predict a fish’s weight based on its dimensions, they can explore the dataset by clicking the “RUN” button to display the top few rows of the DataFrame.

Python

Species	Weight	V-Length	D-Length	X-Length	Height	Width
Bream	290	24	26.3	31.2	12.48	4.3056
Bream	340	23.9	26.5	31.1	12.3778	4.6961
Bream	363	26.3	29	33.5	12.73	4.4555
Bream	430	26.5	29	34	12.444	5.134
Bream	450	26.8	29.7	34.7	13.6024	4.9274
Bream	500	26.8	29.7	34.5	14.1795	5.2785
Bream	390	27.6	30	35	12.67	4.69
Bream	450	27.6	30	35.1	14.0049	4.8438
Bream	500	28.5	30.7	36.2	14.2266	4.9594
Bream	475	28.4	31	36.2	14.2628	5.1042
Bream	500	28.7	31	36.2	14.3714	4.8146
Bream	500	29.1	31.5	36.4	13.7592	4.368
Bream	340	29.5	32	37.3	13.9129	5.0728
Bream	600	29.4	32	37.2	14.9544	5.1708
Bream	600	29.4	32	37.2	15.438	5.58
Bream	700	30.4	33	38.3	14.8604	5.2854
Bream	700	30.4	33	38.5	14.938	5.1975
Bream	610	30.9	33.5	38.6	15.633	5.1338
Bream	650	31	33.5	38.7	14.4738	5.7276
Bream	575	31.3	34	39.5	15.1285	5.5695
Bream	685	31.4	34	39.2	15.9936	5.3704
Bream	620	31.5	34.5	39.7	15.5227	5.2801
Bream	680	31.8	35	40.6	15.4686	6.1306
Bream	700	31.9	35	40.5	16.2405	5.589
Bream	725	31.8	35	40.9	16.36	6.0532
Bream	720	32	35	40.6	16.3618	6.09
Bream	714	32.7	36	41.5	16.517	5.8515
Bream	850	32.8	36	41.6	16.8896	6.1984
Bream	1000	33.5	37	42.6	18.957	6.603
Bream	920	35	38.5	44.1	18.0369	6.3063
Bream	955	35	38.5	44	18.084	6.292
Bream	925	36.2	39.5	45.3	18.7542	6.7497
Bream	975	37.4	41	45.9	18.6354	6.7473
Bream	950	38	41	46.5	17.6235	6.3705
Roach	40	12.9	14.1	16.2	4.1472	2.268
Roach	69	16.5	18.2	20.3	5.2983	2.8217
Roach	78	17.5	18.8	21.2	5.5756	2.9044
Roach	87	18.2	19.8	22.2	5.6166	3.1746
Roach	120	18.6	20	22.2	6.216	3.5742
Roach	0	19	20.5	22.8	6.4752	3.3516
Roach	110	19.1	20.8	23.1	6.1677	3.3957
Roach	120	19.4	21	23.7	6.1146	3.2943
Roach	150	20.4	22	24.7	5.8045	3.7544
Roach	145	20.5	22	24.3	6.6339	3.5478
Roach	160	20.5	22.5	25.3	7.0334	3.8203
Roach	140	21	22.5	25	6.55	3.325
Roach	160	21.1	22.5	25	6.4	3.8
Roach	169	22	24	27.2	7.5344	3.8352
Roach	161	22	23.4	26.7	6.9153	3.6312
Roach	200	22.1	23.5	26.8	7.3968	4.1272
Roach	180	23.6	25.2	27.9	7.0866	3.906
Roach	290	24	26	29.2	8.8768	4.4968
Roach	272	25	27	30.6	8.568	4.7736
Roach	390	29.5	31.7	35	9.485	5.355
Whitefish	270	23.6	26	28.7	8.3804	4.2476
Whitefish	270	24.1	26.5	29.3	8.1454	4.2485
Whitefish	306	25.6	28	30.8	8.778	4.6816
Whitefish	540	28.5	31	34	10.744	6.562
Whitefish	800	33.7	36.4	39.6	11.7612	6.5736
Whitefish	1000	37.3	40	43.5	12.354	6.525
Parkki	55	13.5	14.7	16.5	6.8475	2.3265
Parkki	60	14.3	15.5	17.4	6.5772	2.3142
Parkki	90	16.3	17.7	19.8	7.4052	2.673
Parkki	120	17.5	19	21.3	8.3922	2.9181
Parkki	150	18.4	20	22.4	8.8928	3.2928
Parkki	140	19	20.7	23.2	8.5376	3.2944
Parkki	170	19	20.7	23.2	9.396	3.4104
Parkki	145	19.8	21.5	24.1	9.7364	3.1571
Parkki	200	21.2	23	25.8	10.3458	3.6636
Parkki	273	23	25	28	11.088	4.144
Parkki	300	24	26	29	11.368	4.234
Perch	5.9	7.5	8.4	8.8	2.112	1.408
Perch	32	12.5	13.7	14.7	3.528	1.9992
Perch	40	13.8	15	16	3.824	2.432
Perch	51.5	15	16.2	17.2	4.5924	2.6316
Perch	70	15.7	17.4	18.5	4.588	2.9415
Perch	100	16.2	18	19.2	5.2224	3.3216
Perch	78	16.8	18.7	19.4	5.1992	3.1234
Perch	80	17.2	19	20.2	5.6358	3.0502
Perch	85	17.8	19.6	20.8	5.1376	3.0368
Perch	85	18.2	20	21	5.082	2.772
Perch	110	19	21	22.5	5.6925	3.555
Perch	115	19	21	22.5	5.9175	3.3075
Perch	125	19	21	22.5	5.6925	3.6675
Perch	130	19.3	21.3	22.8	6.384	3.534
Perch	120	20	22	23.5	6.11	3.4075
Perch	120	20	22	23.5	5.64	3.525
Perch	130	20	22	23.5	6.11	3.525
Perch	135	20	22	23.5	5.875	3.525
Perch	110	20	22	23.5	5.5225	3.995
Perch	130	20.5	22.5	24	5.856	3.624
Perch	150	20.5	22.5	24	6.792	3.624
Perch	145	20.7	22.7	24.2	5.9532	3.63
Perch	150	21	23	24.5	5.2185	3.626
Perch	170	21.5	23.5	25	6.275	3.725
Perch	225	22	24	25.5	7.293	3.723
Perch	145	22	24	25.5	6.375	3.825
Perch	188	22.6	24.6	26.2	6.7334	4.1658
Perch	180	23	25	26.5	6.4395	3.6835
Perch	197	23.5	25.6	27	6.561	4.239
Perch	218	25	26.5	28	7.168	4.144
Perch	300	25.2	27.3	28.7	8.323	5.1373
Perch	260	25.4	27.5	28.9	7.1672	4.335
Perch	265	25.4	27.5	28.9	7.0516	4.335
Perch	250	25.4	27.5	28.9	7.2828	4.5662
Perch	250	25.9	28	29.4	7.8204	4.2042
Perch	300	26.9	28.7	30.1	7.5852	4.6354
Perch	320	27.8	30	31.6	7.6156	4.7716
Perch	514	30.5	32.8	34	10.03	6.018
Perch	556	32	34.5	36.5	10.2565	6.3875
Perch	840	32.5	35	37.3	11.4884	7.7957
Perch	685	34	36.5	39	10.881	6.864
Perch	700	34	36	38.3	10.6091	6.7408
Perch	700	34.5	37	39.4	10.835	6.2646
Perch	690	34.6	37	39.3	10.5717	6.3666
Perch	900	36.5	39	41.4	11.1366	7.4934
Perch	650	36.5	39	41.4	11.1366	6.003
Perch	820	36.6	39	41.3	12.4313	7.3514
Perch	850	36.9	40	42.3	11.9286	7.1064
Perch	900	37	40	42.5	11.73	7.225
Perch	1015	37	40	42.4	12.3808	7.4624
Perch	820	37.1	40	42.5	11.135	6.63
Perch	1100	39	42	44.6	12.8002	6.8684
Perch	1000	39.8	43	45.2	11.9328	7.2772
Perch	1100	40.1	43	45.5	12.5125	7.4165
Perch	1000	40.2	43.5	46	12.604	8.142
Perch	1000	41.1	44	46.6	12.4888	7.5958
Pike	200	30	32.3	34.8	5.568	3.3756
Pike	300	31.7	34	37.8	5.7078	4.158
Pike	300	32.7	35	38.8	5.9364	4.3844
Pike	300	34.8	37.3	39.8	6.2884	4.0198
Pike	430	35.5	38	40.5	7.29	4.5765
Pike	345	36	38.5	41	6.396	3.977
Pike	456	40	42.5	45.5	7.28	4.3225
Pike	510	40	42.5	45.5	6.825	4.459
Pike	540	40.1	43	45.8	7.786	5.1296
Pike	500	42	45	48	6.96	4.896
Pike	567	43.2	46	48.7	7.792	4.87
Pike	770	44.8	48	51.2	7.68	5.376
Pike	950	48.3	51.7	55.1	8.9262	6.1712
Pike	1250	52	56	59.7	10.6863	6.9849
Pike	1600	56	60	64	9.6	6.144
Pike	1550	56	60	64	9.6	6.144
Pike	1650	59	63.4	68	10.812	7.48
Smelt	6.7	9.3	9.8	10.8	1.7388	1.0476
Smelt	7.5	10	10.5	11.6	1.972	1.16
Smelt	7	10.1	10.6	11.6	1.7284	1.1484
Smelt	9.7	10.4	11	12	2.196	1.38
Smelt	9.8	10.7	11.2	12.4	2.0832	1.2772
Smelt	8.7	10.8	11.3	12.6	1.9782	1.2852
Smelt	10	11.3	11.8	13.1	2.2139	1.2838
Smelt	9.9	11.3	11.8	13.1	2.2139	1.1659
Smelt	9.8	11.4	12	13.2	2.2044	1.1484
Smelt	12.2	11.5	12.2	13.4	2.0904	1.3936
Smelt	13.4	11.7	12.4	13.5	2.43	1.269
Smelt	12.2	12.1	13	13.8	2.277	1.2558
Smelt	19.7	13.2	14.3	15.2	2.8728	2.0672
Smelt	19.9	13.8	15	16.2	2.9322	1.8792

Line 2: pandas library is imported to read DataFrame.
Line 6: Read the data from the Fish.txt file with columns defined in line 5.
Line 9: Prints the top five rows of the DataFrame. The three lengths define the vertical, diagonal, and cross lengths in cm.

Here, the fish’s length, height, and width are independent variables, with weight serving as the dependent variable. In machine learning, independent variables are often referred to as features and dependent variables as labels, and these terms will be used interchangeably throughout this blog.

Linear regression model#

Linear regression models, a fundamental concept you’ll encounter as you learn machine learning, are widely used in statistics and machine learning. These models use a straight line to describe the relationship between an independent variable and a dependent variable. For example, when analyzing the weight of fish, a linear regression model is used to describe the relationship between the weight $y$ of the fish and one of the independent variables $X$ as follows,

y = m \cdot X + c.

Where $m$ is the slope of the line that defines its steepness, and $c$ is the y-intercept, the point where line crosses the y-axis.

Ater examining the first column, the following is observed:

There is a strong correlation between Weight, and the feature X-Length.
The Weight has the weakest correlation with Height.

Given this information, it is clear that if the individual is limited to using only one independent variable to predict the dependent variable, they should choose X-Length and not Height.

# Step 3: Separating the data into features and labels
X = Fish[['X-Length']]
y = Fish['Weight']

Splitting data#

With the features and labels in place, DataFrame can now be divided into training and test sets. The training dataset trains the model, while the test dataset evaluates its performance.

The train_test_split function is imported from the sklearn library to split the data.

The arguments of the train_test_split function can be examined as follows:

Line 6: Pass the feature and the label.
Line 7: Use test_size=0.3 to select 70% of the data for training and the remaining 30% for testing purposes.
Lines 8–9: Make the split random and use shuffle=True to ensure that the model is not overfitting to a specific set of data.

As a result, the training data in variables X_train and y_train and test data in X_test and y_test is obtained.

Applying model#

At this point, the linear regression model can be created.

Line 1: The LinearRegression function from sklearn library is imported.
Line 4: Creates and train the model using the training data X_train and y_train.

Model validation#

Remember, 30% of the data was set aside for testing. The Mean Absolute Error (MAE) can be calculated using this data as an indicator of the average absolute difference between the predicted and actual values, with a lower MAE value indicating more accurate predictions. Other measures for model validation exist, but they won’t be explored in this context.

Here’s a complete running example, including all of the previously mentioned steps mentioned above to perform a linear regression.

Python

# Step 1: Importing libraries 
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
# Step 2: Defining the columns of and reading the DataFrame 
columns = ['Species', 'Weight',	'V-Length', 'D-Length', 'X-Length', 'Height', 'Width']
Fish = pd.read_csv('Fish.txt', sep='\t', usecols=columns)
# Step 3: Seperating the data into features and labels
X = Fish[['X-Length']]
y = Fish['Weight']
# Step 4: Dividing the dataset into test and train data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=10, shuffle=True)
# Step 5: Selecting the linear regression method from the scikit-learn library
model = LinearRegression().fit(X_train, y_train)
# Step 6: Validation
# Evaluating the trained model on training data
y_prediction = model.predict(X_train)
print("MAE on train data= " , metrics.mean_absolute_error(y_train, y_prediction))
# Evaluating the trained model on test data
y_prediction = model.predict(X_test)
print("MAE on test data = " , metrics.mean_absolute_error(y_test, y_prediction))

In this instance, the model.predict() function is applied to the training data on line 23, and on line 26, it is used on the test data. But what does it show?

Essentially, this approach demonstrates the model’s performance on a known dataset when compared to an unfamiliar test dataset. The two MAE values suggest that the predictions on both train and test data are similar.

Note: It is essential to recall that the X-Length was chosen as the feature because of its high correlation with the label. To verify the choice of feature, one can replace it with the Height on line 12 and rerun the linear regression, then compare the two MAE values.

Multiple linear regression model in Python#

So far, only one feature, X-Length has been used to train the model. However, there are features available that can be utilized to improve the predictions. These features include the vertical length, diagonal length, height, and width of the fish, and can be used to re-evaluate the linear regression model.

# Step 3: Separating the data into features and labels
X = Fish[['V-Length', 'D-Length', 'X-Length', 'Height', 'Width']]
y = Fish['Weight']

Mathematically, the multiple linear regression model can be written as follows:

y = m_1 \cdot X_1 + m_2 \cdot X_2 + \cdots + m_n \cdot X_n + C

where $m_i$ represents the weightage for feature $X_i$ in predicting $y$ and $n$ denotes the number of features.

Following the similar steps as earlier, the performance of the model can be calculated by utilizing all the features.

Python

Species	Weight	V-Length	D-Length	X-Length	Height	Width
Bream	290	24	26.3	31.2	12.48	4.3056
Bream	340	23.9	26.5	31.1	12.3778	4.6961
Bream	363	26.3	29	33.5	12.73	4.4555
Bream	430	26.5	29	34	12.444	5.134
Bream	450	26.8	29.7	34.7	13.6024	4.9274
Bream	500	26.8	29.7	34.5	14.1795	5.2785
Bream	390	27.6	30	35	12.67	4.69
Bream	450	27.6	30	35.1	14.0049	4.8438
Bream	500	28.5	30.7	36.2	14.2266	4.9594
Bream	475	28.4	31	36.2	14.2628	5.1042
Bream	500	28.7	31	36.2	14.3714	4.8146
Bream	500	29.1	31.5	36.4	13.7592	4.368
Bream	340	29.5	32	37.3	13.9129	5.0728
Bream	600	29.4	32	37.2	14.9544	5.1708
Bream	600	29.4	32	37.2	15.438	5.58
Bream	700	30.4	33	38.3	14.8604	5.2854
Bream	700	30.4	33	38.5	14.938	5.1975
Bream	610	30.9	33.5	38.6	15.633	5.1338
Bream	650	31	33.5	38.7	14.4738	5.7276
Bream	575	31.3	34	39.5	15.1285	5.5695
Bream	685	31.4	34	39.2	15.9936	5.3704
Bream	620	31.5	34.5	39.7	15.5227	5.2801
Bream	680	31.8	35	40.6	15.4686	6.1306
Bream	700	31.9	35	40.5	16.2405	5.589
Bream	725	31.8	35	40.9	16.36	6.0532
Bream	720	32	35	40.6	16.3618	6.09
Bream	714	32.7	36	41.5	16.517	5.8515
Bream	850	32.8	36	41.6	16.8896	6.1984
Bream	1000	33.5	37	42.6	18.957	6.603
Bream	920	35	38.5	44.1	18.0369	6.3063
Bream	955	35	38.5	44	18.084	6.292
Bream	925	36.2	39.5	45.3	18.7542	6.7497
Bream	975	37.4	41	45.9	18.6354	6.7473
Bream	950	38	41	46.5	17.6235	6.3705
Roach	40	12.9	14.1	16.2	4.1472	2.268
Roach	69	16.5	18.2	20.3	5.2983	2.8217
Roach	78	17.5	18.8	21.2	5.5756	2.9044
Roach	87	18.2	19.8	22.2	5.6166	3.1746
Roach	120	18.6	20	22.2	6.216	3.5742
Roach	0	19	20.5	22.8	6.4752	3.3516
Roach	110	19.1	20.8	23.1	6.1677	3.3957
Roach	120	19.4	21	23.7	6.1146	3.2943
Roach	150	20.4	22	24.7	5.8045	3.7544
Roach	145	20.5	22	24.3	6.6339	3.5478
Roach	160	20.5	22.5	25.3	7.0334	3.8203
Roach	140	21	22.5	25	6.55	3.325
Roach	160	21.1	22.5	25	6.4	3.8
Roach	169	22	24	27.2	7.5344	3.8352
Roach	161	22	23.4	26.7	6.9153	3.6312
Roach	200	22.1	23.5	26.8	7.3968	4.1272
Roach	180	23.6	25.2	27.9	7.0866	3.906
Roach	290	24	26	29.2	8.8768	4.4968
Roach	272	25	27	30.6	8.568	4.7736
Roach	390	29.5	31.7	35	9.485	5.355
Whitefish	270	23.6	26	28.7	8.3804	4.2476
Whitefish	270	24.1	26.5	29.3	8.1454	4.2485
Whitefish	306	25.6	28	30.8	8.778	4.6816
Whitefish	540	28.5	31	34	10.744	6.562
Whitefish	800	33.7	36.4	39.6	11.7612	6.5736
Whitefish	1000	37.3	40	43.5	12.354	6.525
Parkki	55	13.5	14.7	16.5	6.8475	2.3265
Parkki	60	14.3	15.5	17.4	6.5772	2.3142
Parkki	90	16.3	17.7	19.8	7.4052	2.673
Parkki	120	17.5	19	21.3	8.3922	2.9181
Parkki	150	18.4	20	22.4	8.8928	3.2928
Parkki	140	19	20.7	23.2	8.5376	3.2944
Parkki	170	19	20.7	23.2	9.396	3.4104
Parkki	145	19.8	21.5	24.1	9.7364	3.1571
Parkki	200	21.2	23	25.8	10.3458	3.6636
Parkki	273	23	25	28	11.088	4.144
Parkki	300	24	26	29	11.368	4.234
Perch	5.9	7.5	8.4	8.8	2.112	1.408
Perch	32	12.5	13.7	14.7	3.528	1.9992
Perch	40	13.8	15	16	3.824	2.432
Perch	51.5	15	16.2	17.2	4.5924	2.6316
Perch	70	15.7	17.4	18.5	4.588	2.9415
Perch	100	16.2	18	19.2	5.2224	3.3216
Perch	78	16.8	18.7	19.4	5.1992	3.1234
Perch	80	17.2	19	20.2	5.6358	3.0502
Perch	85	17.8	19.6	20.8	5.1376	3.0368
Perch	85	18.2	20	21	5.082	2.772
Perch	110	19	21	22.5	5.6925	3.555
Perch	115	19	21	22.5	5.9175	3.3075
Perch	125	19	21	22.5	5.6925	3.6675
Perch	130	19.3	21.3	22.8	6.384	3.534
Perch	120	20	22	23.5	6.11	3.4075
Perch	120	20	22	23.5	5.64	3.525
Perch	130	20	22	23.5	6.11	3.525
Perch	135	20	22	23.5	5.875	3.525
Perch	110	20	22	23.5	5.5225	3.995
Perch	130	20.5	22.5	24	5.856	3.624
Perch	150	20.5	22.5	24	6.792	3.624
Perch	145	20.7	22.7	24.2	5.9532	3.63
Perch	150	21	23	24.5	5.2185	3.626
Perch	170	21.5	23.5	25	6.275	3.725
Perch	225	22	24	25.5	7.293	3.723
Perch	145	22	24	25.5	6.375	3.825
Perch	188	22.6	24.6	26.2	6.7334	4.1658
Perch	180	23	25	26.5	6.4395	3.6835
Perch	197	23.5	25.6	27	6.561	4.239
Perch	218	25	26.5	28	7.168	4.144
Perch	300	25.2	27.3	28.7	8.323	5.1373
Perch	260	25.4	27.5	28.9	7.1672	4.335
Perch	265	25.4	27.5	28.9	7.0516	4.335
Perch	250	25.4	27.5	28.9	7.2828	4.5662
Perch	250	25.9	28	29.4	7.8204	4.2042
Perch	300	26.9	28.7	30.1	7.5852	4.6354
Perch	320	27.8	30	31.6	7.6156	4.7716
Perch	514	30.5	32.8	34	10.03	6.018
Perch	556	32	34.5	36.5	10.2565	6.3875
Perch	840	32.5	35	37.3	11.4884	7.7957
Perch	685	34	36.5	39	10.881	6.864
Perch	700	34	36	38.3	10.6091	6.7408
Perch	700	34.5	37	39.4	10.835	6.2646
Perch	690	34.6	37	39.3	10.5717	6.3666
Perch	900	36.5	39	41.4	11.1366	7.4934
Perch	650	36.5	39	41.4	11.1366	6.003
Perch	820	36.6	39	41.3	12.4313	7.3514
Perch	850	36.9	40	42.3	11.9286	7.1064
Perch	900	37	40	42.5	11.73	7.225
Perch	1015	37	40	42.4	12.3808	7.4624
Perch	820	37.1	40	42.5	11.135	6.63
Perch	1100	39	42	44.6	12.8002	6.8684
Perch	1000	39.8	43	45.2	11.9328	7.2772
Perch	1100	40.1	43	45.5	12.5125	7.4165
Perch	1000	40.2	43.5	46	12.604	8.142
Perch	1000	41.1	44	46.6	12.4888	7.5958
Pike	200	30	32.3	34.8	5.568	3.3756
Pike	300	31.7	34	37.8	5.7078	4.158
Pike	300	32.7	35	38.8	5.9364	4.3844
Pike	300	34.8	37.3	39.8	6.2884	4.0198
Pike	430	35.5	38	40.5	7.29	4.5765
Pike	345	36	38.5	41	6.396	3.977
Pike	456	40	42.5	45.5	7.28	4.3225
Pike	510	40	42.5	45.5	6.825	4.459
Pike	540	40.1	43	45.8	7.786	5.1296
Pike	500	42	45	48	6.96	4.896
Pike	567	43.2	46	48.7	7.792	4.87
Pike	770	44.8	48	51.2	7.68	5.376
Pike	950	48.3	51.7	55.1	8.9262	6.1712
Pike	1250	52	56	59.7	10.6863	6.9849
Pike	1600	56	60	64	9.6	6.144
Pike	1550	56	60	64	9.6	6.144
Pike	1650	59	63.4	68	10.812	7.48
Smelt	6.7	9.3	9.8	10.8	1.7388	1.0476
Smelt	7.5	10	10.5	11.6	1.972	1.16
Smelt	7	10.1	10.6	11.6	1.7284	1.1484
Smelt	9.7	10.4	11	12	2.196	1.38
Smelt	9.8	10.7	11.2	12.4	2.0832	1.2772
Smelt	8.7	10.8	11.3	12.6	1.9782	1.2852
Smelt	10	11.3	11.8	13.1	2.2139	1.2838
Smelt	9.9	11.3	11.8	13.1	2.2139	1.1659
Smelt	9.8	11.4	12	13.2	2.2044	1.1484
Smelt	12.2	11.5	12.2	13.4	2.0904	1.3936
Smelt	13.4	11.7	12.4	13.5	2.43	1.269
Smelt	12.2	12.1	13	13.8	2.277	1.2558
Smelt	19.7	13.2	14.3	15.2	2.8728	2.0672
Smelt	19.9	13.8	15	16.2	2.9322	1.8792

The MAE values will be similar to the results obtained when using a single feature.

Polynomial regression#

This blog explains the concept of polynomial regression, which is used when the assumption of a linear relationship between the features and label is not accurate. By allowing for a more flexible fit to the data, polynomial regression can capture more complex relationships and lead to more accurate predictions.

For example, if the relationship between the dependent variables and the independent variable is not a straight line, a polynomial regression model can be used to model it more accurately. This can lead to a better fit to the data and more accurate predictions.

Mathematically, the relationship between dependent and independent variables is described using the following equation:

y = m_1 \cdot Z_1 + m_2 \cdot Z_2 + \cdots + m_n \cdot Z_n + C.

The above equation looks very similar to the one used earlier to describe multiple linear regression. However, it includes the transformed features called $Z_i$ 's which are the polynomial version of $X_i$ 's used in multiple linear regression.

This can be further explained using an example of two features $X_1$ and $X_2$ to create new features $Z_1 = X_1^2$ , $Z_2 = X_2^2$ , $Z_3 = X_1X_2$ , $Z_4 = X_1^3$ , $Z_5 = X_2^3$ , $Z_6 = X_1^2X_2$ , $Z_7 = X_1X_2^2$ , and so on.

The new polynomial features can be created based on trial and error or techniques like cross-validation. The degree of the polynomial can also be chosen based on the complexity of the relationship between the variables.

The following example presents a polynomial regression and validates the models’ performance.

Python

# Step 1: Importing libraries 
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn.preprocessing import PolynomialFeatures
# Step 2: Defining the columns and reading the DataFrame 
columns = ['Species', 'Weight',	'V-Length', 'D-Length', 'X-Length', 'Height', 'Width']
Fish = pd.read_csv('Fish.txt', sep='\t', usecols=columns)
# Step 3: Seperating the data into features and labels
X = Fish[['V-Length', 'D-Length', 'X-Length', 'Height', 'Width']]
y = Fish['Weight']
# Step 4: Generating polynomial features 
Z = PolynomialFeatures(degree=2, include_bias=False).fit_transform(X)
# Dividing the dataset into test and train data
X_train, X_test, y_train, y_test = train_test_split(Z, y, test_size=0.3, random_state=10)
# Step 5: Selecting the linear regression method from the scikit-learn library
model = LinearRegression().fit(X_train, y_train)
# Step 6: Validation
# Evaluating the trained model on training data
y_prediction = model.predict(X_train)
print("MAE on train data= " , metrics.mean_absolute_error(y_train, y_prediction))
# Evaluating our trained model on test data
y_prediction = model.predict(X_test)
print("MAE on test data = " , metrics.mean_absolute_error(y_test, y_prediction))

The features were transformed using PolynomialFeatures function on line 18. The PolynomialFeatures function, imported from the sklearn library on line 7, was used for this purpose.

It should be noticed that the MAE value in this case is superior to that of linear regression models, implying that the linear assumption was not entirely accurate.

This blog has provided a quick introduction to Machine learning regression models with python. Don’t stop here! Explore and practice different techniques and libraries to build more accurate and robust models. You can also check out the following courses on Educative:

A Practical Guide to Machine Learning with Python

A Practical Guide to Machine Learning with Python

This course teaches you how to code basic machine learning models. The content is designed for beginners with general knowledge of machine learning, including common algorithms such as linear regression, logistic regression, SVM, KNN, decision trees, and more. If you need a refresher, we have summarized key concepts from machine learning, and there are overviews of specific algorithms dispersed throughout the course.

72hrs 30mins

Beginner

108 Playgrounds

12 Quizzes

Mastering Machine Learning Theory and Practice

The machine learning field is rapidly advancing today due to the availability of large datasets and the ability to process big data efficiently. Moreover, several new techniques have produced groundbreaking results for standard machine learning problems. This course provides a detailed description of different machine learning algorithms and techniques, including regression, deep learning, reinforcement learning, Bayes nets, support vector machines (SVMs), and decision trees. The course also offers sufficient mathematical details for a deeper understanding of how different techniques work. An overview of the Python programming language and the fundamental theoretical aspects of ML, including probability theory and optimization, is also included. The course contains several practical coding exercises as well. By the end of the course, you will have a deep understanding of different machine-learning methods and the ability to choose the right method for different applications.

36hrs

Beginner

109 Playgrounds

10 Quizzes

Hands-on Machine Learning with Scikit-Learn

Scikit-Learn is a powerful library that provides a handful of supervised and unsupervised learning algorithms. If you’re serious about having a career in machine learning, then scikit-learn is a must know. In this course, you will start by learning the various built-in datasets that scikit-learn offers, such as iris and mnist. You will then learn about feature engineering and more specifically, feature selection, feature extraction, and dimension reduction. In the latter half of the course, you will dive into linear and logistic regression where you’ll work through a few challenges to test your understanding. Lastly, you will focus on unsupervised learning and deep learning where you’ll get into k-means clustering and neural networks. By the end of this course, you will have a great new skill to add to your resume, and you’ll be ready to start working on your own projects that will utilize scikit-learn.

5hrs

Intermediate

5 Challenges

2 Quizzes

Frequently Asked Questions

What are the 3 types of regression?

There are three main types of regression: linear, multiple, and logistic. Linear regression models a simple straight-line relationship between a dependent variable and one independent variable. Multiple regression extends linear regression to include two or more independent variables when predicting a dependent variable. Logistic regression predicts the probability of a binary outcome using a logistic function, which is suitable for classification problems.

What are the regression models in Python?

The top 7 regression algorithms frequently utilized in Python and machine learning are linear regression, polynomial regression, ridge regression, lasso regression, elastic net regression, decision tree-based methods, and support vector regression (SVR).

Written By:

Najeeb Ul Hassan

Free Resources

blog

Demystifying Fuzzy Inference Systems

blog

Introduction to convolutional neural networks (CNN)

blog

Bagging vs. Boosting in machine learning