6

I would like to perform Fast Fourier transform on a data series. The series contains values of daily seismic amplitude, sampled consistently across 407 days. I would like to search this dataset for any periodic cycles.

I have made an initial attempt using SciPy documentation here: https://docs.scipy.org/doc/scipy/reference/tutorial/fftpack.html. Similar to this question (link), I then changed the argument for y from an artificial sine function to my dataset.

However, I get the following error:

ValueError: x and y must have same first dimension, but have shapes (203,) and (407, 1)

I would appreciate help on understanding why I get this error, and how I can fix it.

I would also appreciate help on the correct frequency and sampling input values I need for FFT to work on my dataset. I have 407 values in my dataset, and each value represents a day. Thus I have defined N (number of sample points) = 407, and T (sample spacing) = 1 / 84600 (1 / number of seconds in a day). Is that correct?

Here is my full code:

import numpy as np
import matplotlib.pyplot as plt
from scipy.fftpack import fft, ifft
import pandas as pd

# Import csv file
df = pd.read_csv('rsam_2016-17_fft_test.csv', index_col=['DateTime'], parse_dates=['DateTime'])
print(df.head())

#plot data
plt.figure(figsize=(12,4))
df.plot(linestyle = '', marker = '*', color='r')
plt.show()

#FFT
#number of sample points
N = 407
#frequency of signal
T = 1 / 84600
#create x-axis for time length of signal
x = np.linspace(0, N*T, N)
#create array that corresponds to values in signal
y = df
#perform FFT on signal
yf = fft(y)
#create new x-axis: frequency from signal
xf = np.linspace(0.0, 1.0/(2.0*T), N//2)
#plot results
plt.plot(xf, yf)
plt.grid()
plt.show()

Many thanks in advance for your help!


Edit: the full error is below:

Traceback (most recent call last):

File "<ipython-input-40-c090e0039ba9>", line 1, in <module>
runfile('/Users/an16975/Desktop/PhD/Code/FFT/fft_test.py', wdir='/Users/an16975/Desktop/PhD/Code/FFT')

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
execfile(filename, namespace)

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/Users/an16975/Desktop/PhD/Code/FFT/fft_test.py", line 36, in <module>
plt.plot(xf, yf)

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/matplotlib/pyplot.py", line 3317, in plot
ret = ax.plot(*args, **kwargs)

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/matplotlib/__init__.py", line 1898, in inner
return func(ax, *args, **kwargs)

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_axes.py", line 1406, in plot
for line in self._get_lines(*args, **kwargs):

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 407, in _grab_next_args
for seg in self._plot_args(remaining, kwargs):

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 385, in _plot_args
x, y = self._xy_from_xy(x, y)

File "/Users/an16975/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 244, in _xy_from_xy
"have shapes {} and {}".format(x.shape, y.shape))

ValueError: x and y must have same first dimension, but have shapes (203,) and (407, 1)

Second edit:

SleuthEye's answer allowed me to generate the plots that I was looking for. For anyone interested, the plots are below.

The unfiltered data series -

Daily seismic amplitude values for a period within 2016 - 2017

And below, the plot generated from performing FFT on the above data series -

Fast Fourier Transform of daily seismic amplitude dataset

I hope that is the correct output, and that I have correctly labelled the axes/understood what the second plot represents. I would also like to know why the upper part of the Fourier spectrum is redundant.

For reference, my full (corrected) code is below:

import numpy as np
import matplotlib.pyplot as plt
from scipy.fftpack import fft, ifft
import pandas as pd

# Import csv file
df = pd.read_csv('rsam_2016-17_fft_test.csv', index_col=['DateTime'], parse_dates=['DateTime'])
print(df.head())

#plot data
plt.figure(figsize=(12,4))
df.plot(linestyle = '', marker = '*', color='r')
plt.savefig('rsam_2016_2017_snippetforfft.jpg')
plt.show()

#FFT
#number of sample points
N = 407
#frequency of signal (in days)
T = 1
#create x-axis for time length of signal
x = np.linspace(0, N*T, N)
#create array that corresponds to values in signal
y = df
#perform FFT on signal
yf = fft(y)
#create new x-axis: frequency from signal
xf = np.linspace(0.0, 1.0/(2.0*T), N//2)
#plot results
plt.plot(xf, yf[0:N//2], label = 'signal')
plt.grid()
plt.xlabel('Frequency (days)')
plt.ylabel(r'Spectral Amplitude')
plt.legend(loc=1)
plt.savefig('rsam_2016_2017_snippet_fft_firstresult.jpg')
plt.show()
1
  • 1
    Please include the complete error message (i.e the complete traceback) in the question. It shows useful information; in particular, it shows the line that generated the error. Commented Feb 5, 2018 at 13:00

2 Answers 2

2

The fft function returns the full N points spectrum (which for real-valued inputs includes the redundant upper half of the spectrum), whereas your frequency axis xf is constructed to cover only the lower half of the spectrum with only N//2 points. Your error relates to the mismatch between those xf and yf array sizes. Due to the redundancy, you may exclude the upper half of the spectrum in yf using yf[0:N//2].

Note also that the array yf contains complex numbers. To display a spectrogram output, you should take the absolute value:

plt.plot(xf, abs(yf[0:N//2]))

Finally as far as sampling period, if you are going to use seconds for sampling period and Hz for frequencies, you should be using T = 86400 (since you have a data point every 1 day or 86400 seconds). You could also alternatively use days for sampling period and day-1 (or cycles/day) for frequencies, in which case you would use T = 1.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. This answer did yield a plot that appears to give the result I want. Regarding the "redundancy" of the upper half of the spectrum, is this the Nyquist frequency and the sampling theorem referenced here? explanation
Yes the redundancy I mentioned is the symmetry indicated as "The DFT of a real series, ie: imaginary part of x(k) = 0, results in a symmetric series about the Nyquist frequency" from your link
With T=1 your frequency axis units should be days^(-1), or cycles/day, ie. plt.xlabel('Frequency (cycles/day)').
1
import pandas as pd
import numpy as np
from numpy.fft import rfft, rfftfreq
import matplotlib.pyplot as plt

t=pd.read_csv('C:\\Users\\trial\\Desktop\\EW.csv',usecols=[0])
a=pd.read_csv('C:\\Users\\trial\\Desktop\\EW.csv',usecols=[1])
n=len(a)
dt=0.02 #time increment in each data

acc=a.values.flatten() #to convert DataFrame to 1D array
#acc value must be in numpy array format for half way mirror calculation

fft=rfft(acc)*dt
freq=rfftfreq(n,d=dt)

FFT=abs(fft)

plt.plot(freq,FFT)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.