8

I am new to the fourier theory and I've seen very good tutorials on how to apply fft to a signal and plot it in order to see the frequencies it contains. Somehow, all of them create a mix of sines as their data and i am having trouble adapting it to my real problem.

I have 242 hourly observations with a daily periodicity, meaning that my period is 24. So I expect to have a peak around 24 on my fft plot.

A sample of my data.csv is here: https://pastebin.com/1srKFpJQ

Data plotted:

The Series

My code:

data = pd.read_csv('data.csv',index_col=0)
data.index = pd.to_datetime(data.index)
data = data['max_open_files'].astype(float).values

N = data.shape[0] #number of elements
t = np.linspace(0, N * 3600, N) #converting hours to seconds
s = data

fft = np.fft.fft(s)
T = t[1] - t[0]

f = np.linspace(0, 1 / T, N)
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.bar(f[:N // 2], np.abs(fft)[:N // 2] * 1 / N, width=1.5)  # 1 / N is a normalization factor
plt.show()

This outputs a very weird result where it seems I am getting the same value for every frequency.

The result

I suppose that the problems comes with the definition of N, t and T but I cannot find anything online that has helped me understand this clearly. Please help :)

EDIT1:

With the code provided by charles answer I have a spike around 0 that seems very weird. I have used rfft and rfftfreq instead to avoid having too much frequencies.

frequencies

I have read that this might be because of the DC component of the series, so after substracting the mean i get:

minusDCcomponentFrequencies

I am having trouble interpreting this, the spikes seem to happen periodically but the values in Hz don't let me obtain my 24 value (the overall frequency). Anybody knows how to interpret this ? What am I missing ?

1
  • 1
    A periodicity of one event every 24 hours means a frequency of 1/(24*3600)=1.15E-5 Hz, so very close to zero on a scale from 0 to 0.5 Commented Dec 4, 2019 at 13:54

2 Answers 2

7

The problem you're seeing is because the bars are too wide, and you're only seeing one bar. You will have to change the width of the bars to 0.00001 or smaller to see them show up.

Instead of using a bar chart, make your x axis using fftfreq = np.fft.fftfreq(len(s)) and then use the plot function, plt.plot(fftfreq, fft):

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

data = pd.read_csv('data.csv',index_col=0)
data.index = pd.to_datetime(data.index)
data = data['max_open_files'].astype(float).values

N = data.shape[0] #number of elements
t = np.linspace(0, N * 3600, N) #converting hours to seconds
s = data

fft = np.fft.fft(s)
fftfreq = np.fft.fftfreq(len(s))

T = t[1] - t[0]

f = np.linspace(0, 1 / T, N)
plt.ylabel("Amplitude")
plt.xlabel("Frequency [Hz]")
plt.plot(fftfreq,fft)
plt.show()
Sign up to request clarification or add additional context in comments.

3 Comments

thank you charles, I was so confused that i didn't notice the width parameter. I have edited my question now with the solutions you provide.
ok - if that answers your question, please accept the answer. otherwise, let me know what else you're missing. I think this answer plus @manu190466's comment (that the frequency spike should be at a small, near-zero number) should help explain what you're seeing and answer your question.
One more thought - at the start you scaled your time values from hours to seconds, so you might consider scaling your x values from Hz (1/seconds) to 1/hours
0

I think to avoid DFT artifacts like aliasing and leakage, you should sample a whole number of days, that is 240 data points. You will see the peaks in the frequency domain corresponding to to daily maxes in your data:

import matplotlib.pyplot as plt
import numpy as np

# sample whole number of days to avoid artifacts (aliasing)
N = 240
data = pd.read_csv('data.csv')
data = data['max_open_files'].values[:N]

# frequency plot (in 1/day units)
dt = 1 / 24
T = N * dt  # 10, total period
f0 = 1 / T  # 0.1, fundamental frequency
freq = np.arange(N) * f0
fft = np.fft.fft(data)

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(freq, np.abs(fft), label='fourier abs')

# plot the peaks representing the daily maximums
ax.plot(freq[::int(1 / f0)], np.abs(fft)[::int(1 / f0)], 'o', label='fourier abs peaks')

ax.legend()
ax.set_xticks(np.arange(25), minor=True)
ax.grid(linestyle=':', linewidth=0.5, which='minor')
ax.grid(linestyle=':', linewidth=0.5, which='major')
ax.set_ylim(0, 12000)
ax.set_xlim(0, 25)
ax.set_xlabel("Frequency [1/day]")
ax.set_ylabel("Amplitude");

And the plot:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.