26

I'm trying to take a second derivative in python with two numpy arrays of data.

For example, the arrays in question look like this:

import numpy as np

x = np.array([ 120. ,  121.5,  122. ,  122.5,  123. ,  123.5,  124. ,  124.5,
        125. ,  125.5,  126. ,  126.5,  127. ,  127.5,  128. ,  128.5,
        129. ,  129.5,  130. ,  130.5,  131. ,  131.5,  132. ,  132.5,
        133. ,  133.5,  134. ,  134.5,  135. ,  135.5,  136. ,  136.5,
        137. ,  137.5,  138. ,  138.5,  139. ,  139.5,  140. ,  140.5,
        141. ,  141.5,  142. ,  142.5,  143. ,  143.5,  144. ,  144.5,
        145. ,  145.5,  146. ,  146.5,  147. ])

y = np.array([  1.25750000e+01,   1.10750000e+01,   1.05750000e+01,
         1.00750000e+01,   9.57500000e+00,   9.07500000e+00,
         8.57500000e+00,   8.07500000e+00,   7.57500000e+00,
         7.07500000e+00,   6.57500000e+00,   6.07500000e+00,
         5.57500000e+00,   5.07500000e+00,   4.57500000e+00,
         4.07500000e+00,   3.57500000e+00,   3.07500000e+00,
         2.60500000e+00,   2.14500000e+00,   1.71000000e+00,
         1.30500000e+00,   9.55000000e-01,   6.65000000e-01,
         4.35000000e-01,   2.70000000e-01,   1.55000000e-01,
         9.00000000e-02,   5.00000000e-02,   2.50000000e-02,
         1.50000000e-02,   1.00000000e-02,   1.00000000e-02,
         1.00000000e-02,   1.00000000e-02,   1.00000000e-02,
         1.00000000e-02,   1.00000000e-02,   5.00000000e-03,
         5.00000000e-03,   5.00000000e-03,   5.00000000e-03,
         5.00000000e-03,   5.00000000e-03,   5.00000000e-03,
         5.00000000e-03,   5.00000000e-03,   5.00000000e-03,
         5.00000000e-03,   5.00000000e-03,   5.00000000e-03,
         5.00000000e-03,   5.00000000e-03])

I currently then have f(x) = y, and I want d^2 y / dx^2.

Numerically, I know I can either interpolate the function and take the derivative analytically or use higher order finite-differences. I think that there is enough data to use either, if one or the other is considered faster, more accurate, etc.

I have looked at np.interp() and scipy.interpolate with no success, as this returns me a fitted (linear or cubic) spline, but don't know how to get the derivative at that point.

Any guidance is much appreciated.

2
  • 1
    Did you have a look at np.diff? Commented Oct 24, 2016 at 20:16
  • My concern is that my data points are not evenly spaced. Commented Oct 24, 2016 at 20:18

2 Answers 2

28

You can interpolate your data using scipy's 1-D Splines functions. The computed spline has a convenient derivative method for computing derivatives.

For the data of your example, using UnivariateSpline gives the following fit

import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline

y_spl = UnivariateSpline(x,y,s=0,k=4)

plt.semilogy(x,y,'ro',label = 'data')
x_range = np.linspace(x[0],x[-1],1000)
plt.semilogy(x_range,y_spl(x_range))

enter image description here

The fit seems reasonably good, at least visually. You might want to experiment with the parameters used by UnivariateSpline.

The second derivate of the spline fit can be simply obtained as

y_spl_2d = y_spl.derivative(n=2)

plt.plot(x_range,y_spl_2d(x_range))

enter image description here

The outcome appears somewhat unnatural (in case your data corresponds to some physical process). You may either want to change the spline fit parameters, improve your data (e.g., provide more samples, perform less noisy measurements), or decide on an analytic function to model your data and perform a curve fit (e.g., using sicpy's curve_fit)

Sign up to request clarification or add additional context in comments.

3 Comments

This data is supposed to represent a probability density function. What are the best methods to normalize this curve and apply some rules (like no negative values), etc?
I don't think there is a standard answer to that, since the approach of using a generic interpolation method has limited options in imposing constraints. In principle, you will need to formulate and solve a constrained optimization problem from scratch. You might want to start by normalizing your data since y_spl.integral(x[0],x[-1]) is about 80, which, of course, is not a valid value for a pdf.
How does this answer differ from using ratios of np.diff twice? Better or worse numerically?
14

By finite differences, the first order derivative of y for each mean value of x over your array is given by :

dy=np.diff(y,1)
dx=np.diff(x,1)
yfirst=dy/dx

And the corresponding values of x are :

xfirst=0.5*(x[:-1]+x[1:])

For the second order, do the same process again :

dyfirst=np.diff(yfirst,1)
dxfirst=np.diff(xfirst,1)
ysecond=dyfirst/dxfirst

xsecond=0.5*(xfirst[:-1]+xfirst[1:])

1 Comment

np.diff(np.diff([x*x for x in range(0,10)])) = [2,2,2..]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.