I am a beginner with Python and programming in general. I am trying to write a program that iterates through a specific numpy array, and detects anomalies within the dataset (the definition of an anomaly is any point that is greater than 3 times the standard deviation from the mean WITHOUT the data point). I need to recalculate the mean and standard deviation for each time an anomalous data point is removed.
I have written the below code, but noticed a couple of issues. After the loop is iterated through once, it states that the value of 160 is removed, but when I print new_array, I still see 160 in the array.
Also, how could I recalculate the new mean for each time a data point is removed? I feel like something is just positioned incorrectly within the for loop. And finally is my use of continue correct or should it be placed elsewhere?
import numpy as np
data_array = np.array([
99.5697438 , 94.47019021, 55., 106.86672855,
102.78730151, 131.85777845, 88.25376895, 96.94439838,
83.67782174, 115.57993209, 118.97651966, 94.40479467,
79.63342207, 77.88602065, 96.59145004, 99.50145353,
97.25980235, 87.72010069, 101.30597215, 87.3110369 ,
110.0687946 , 104.71504012, 89.34719772, 160.,
110.61519268, 112.94716398, 104.41867586])
for cell in data_array:
mean = np.mean(data_array, axis=0)
sd = np.std(data_array, axis=0)
lower_anomaly_point = mean - (3 * sd)
upper_anomaly_point = mean + (3 * sd)
if cell > upper_anomaly_point or cell < lower_anomaly_point:
print(str(cell) + 'has been removed.')
new_array = np.delete(data_array, cell)
continue
3*stdmargin: 1. Find the outlier, that is farthest away frommeanand delete it 2. Calculate newmeanandstd