1

I am developing a code to analyze the relation of two variables. I am using a DataFrame to save the variables in two columns as it follows:

column A = 132.54672, 201.3845717, 323.2654551  
column B = 51.54671995,  96.38457166, 131.2654551

I have tried to use statsmodels but it says that I do not have enough samples.

Can anyone help me? I need to define the coefficient and the intercept in order to calculate other variables.

y = coefficient * x + intercept
5
  • 1
    What is your code? Commented Dec 19, 2018 at 0:56
  • 1
    Do you really have to use Dataframes? Commented Dec 19, 2018 at 0:59
  • 1
    X = df ['A'].astype(float) Y = df ['B'].astype(float) # Note the difference in argument order model = sm.OLS(Y, X).fit() predictions = model.predict(X) # make the predictions by the model # Print out the statistics model.summary() Commented Dec 19, 2018 at 1:01
  • 1
    I can use arrays Commented Dec 19, 2018 at 1:02
  • Please don't print code in comments. Instead edit your post. Commented Dec 19, 2018 at 1:09

4 Answers 4

2

Ok, here is a solution using DataFrame. I am skipping the import commands and showing only the relevant part. In case you wonder what they are, drop me a comment.

I am using NumPy's polyfit for linear regression of order 1. You can print the fit (fit) to get the slope and the intercept. fit[0] is the intercept and fit[1] is the slope (or coefficient, as you call it)

column_A= [132.54672, 201.3845717, 323.2654551]
column_B= [51.54671995, 96.38457166, 131.2654551]
df = pd.DataFrame({'A': column_A, 'B': column_B})

fit = np.poly1d(np.polyfit(df['A'], df['B'], 1))

A_mesh = np.linspace(min(df['A']), max(df['A']), 100)

plt.plot(df['A'], df['B'], 'bx', label='Data', ms=10)
plt.plot(A_mesh, fit(A_mesh), '-b', label='Linear fit')

print (fit)
# 0.4028 x + 4.833

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! You were very helpful! I really appreciate it!
2

You can do this with curve_fit:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

x = np.array([132.54672, 201.3845717, 323.2654551])
y = np.array([51.54671995, 96.38457166, 131.2654551])

linear = lambda x, a, b: a * x + b

popt, pcov = curve_fit(linear, x, y, p0=[1, 1])
plt.plot(x, y, "rx")
plt.plot(x, linear(x, *popt), "b-")
plt.title("f(x)=a*x+b, a={:.2f}, b={:.2f}".format(*popt))
plt.show()

Plot:

enter image description here

Comments

1

Using scipy.stats:

import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt


column_A= [132.54672, 201.3845717, 323.2654551]
column_B= [51.54671995, 96.38457166, 131.2654551]
df = pd.DataFrame({'A': column_A, 'B': column_B})

reg = stats.linregress(df.A, df.B)

plt.plot(df.A, df.B, 'bo', label='Data')
plt.plot(df.A, reg.intercept + reg.slope * df.A, 'k-', label='Linear Regression')
plt.xlabel('A')
plt.ylabel('B')
plt.legend()
plt.show()

enter image description here

You can also find useful methods from dir(reg), which include

.intercept .pvalue .rvalue .slope .stderr

See here.

Comments

0

In addition to the previous excellent answers, here is a graphical fitter that has a 3D scatterplot, 3D surface plot, and a contour plot.

import numpy, scipy, scipy.optimize
import matplotlib
from mpl_toolkits.mplot3d import  Axes3D
from matplotlib import cm # to colormap 3D surfaces from blue to red
import matplotlib.pyplot as plt

graphWidth = 800 # units are pixels
graphHeight = 600 # units are pixels

# 3D contour plot lines
numberOfContourLines = 16


def SurfacePlot(func, data, fittedParameters):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)

    matplotlib.pyplot.grid(True)
    axes = Axes3D(f)

    x_data = data[0]
    y_data = data[1]
    z_data = data[2]

    xModel = numpy.linspace(min(x_data), max(x_data), 20)
    yModel = numpy.linspace(min(y_data), max(y_data), 20)
    X, Y = numpy.meshgrid(xModel, yModel)

    Z = func(numpy.array([X, Y]), *fittedParameters)

    axes.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm, linewidth=1, antialiased=True)

    axes.scatter(x_data, y_data, z_data) # show data along with plotted surface

    axes.set_title('Surface Plot (click-drag with mouse)') # add a title for surface plot
    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label
    axes.set_zlabel('Z Data') # Z axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot or else thaere can be memory and process problems


def ContourPlot(func, data, fittedParameters):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    x_data = data[0]
    y_data = data[1]
    z_data = data[2]

    xModel = numpy.linspace(min(x_data), max(x_data), 20)
    yModel = numpy.linspace(min(y_data), max(y_data), 20)
    X, Y = numpy.meshgrid(xModel, yModel)

    Z = func(numpy.array([X, Y]), *fittedParameters)

    axes.plot(x_data, y_data, 'o')

    axes.set_title('Contour Plot') # add a title for contour plot
    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    CS = matplotlib.pyplot.contour(X, Y, Z, numberOfContourLines, colors='k')
    matplotlib.pyplot.clabel(CS, inline=1, fontsize=10) # labels for contours

    plt.show()
    plt.close('all') # clean up after using pyplot or else thaere can be memory and process problems


def ScatterPlot(data):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)

    matplotlib.pyplot.grid(True)
    axes = Axes3D(f)
    x_data = data[0]
    y_data = data[1]
    z_data = data[2]

    axes.scatter(x_data, y_data, z_data)

    axes.set_title('Scatter Plot (click-drag with mouse)')
    axes.set_xlabel('X Data')
    axes.set_ylabel('Y Data')
    axes.set_zlabel('Z Data')

    plt.show()
    plt.close('all') # clean up after using pyplot or else thaere can be memory and process problems


def func(data, a, alpha, beta):
    t = data[0]
    p_p = data[1]
    return a * (t**alpha) * (p_p**beta)


if __name__ == "__main__":
    xData = numpy.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0])
    yData = numpy.array([11.0, 12.1, 13.0, 14.1, 15.0, 16.1, 17.0, 18.1, 90.0])
    zData = numpy.array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.0, 9.9])

    data = [xData, yData, zData]

    initialParameters = [1.0, 1.0, 1.0] # these are the same as scipy default values in this example

    # here a non-linear surface fit is made with scipy's curve_fit()
    fittedParameters, pcov = scipy.optimize.curve_fit(func, [xData, yData], zData, p0 = initialParameters)

    ScatterPlot(data)
    SurfacePlot(func, data, fittedParameters)
    ContourPlot(func, data, fittedParameters)

    print('fitted prameters', fittedParameters)

    modelPredictions = func(data, *fittedParameters) 

    absError = modelPredictions - zData

    SE = numpy.square(absError) # squared errors
    MSE = numpy.mean(SE) # mean squared errors
    RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
    Rsquared = 1.0 - (numpy.var(absError) / numpy.var(zData))
    print('RMSE:', RMSE)
    print('R-squared:', Rsquared)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.