I have a dataframe which contains two columns. One column shows distance, the other column contains unique 'trackIds' that are associated with a set of distances.
Example:
trackId. distance
2. 17.452
2. 8.650
2. 10.392
2. 11.667
2. 23.551
2. 9.881
3. 6.052
3. 7.241
3. 8.459
3. 22.644
3. 126.890
3. 12.442
3. 5.891
4. 44.781
4. 7.657
4. 36.781
4. 224.001
What I am trying to do is eliminate any trackIds that contain a large spike in distance -- a spike that is > 75. In this example case, track Ids 3 and 4 (and all their associated distances) would be removed from the dataframe because we see spikes in distance greater than 75, thus we would just be left with a dataframe containing trackId 2 and all of its associated distance values.
Here is my code:
i = 0
k = 1
length = len(dataframe)
while i < length:
if (dataframe.distance[k] - dataframe.distance[i]) > 75:
bad_id = dataframe.trackId[k]
condition = dataframe.trackid != bad_id
df2 = dataframe[condition]
i+=1
I tried to use a while loop that was able to go through all the different trackIds, subtract all the distance values and see if the result was > 75, if it was, then the program associated that trackId with the variable 'bad_id' and used that as a condition to filter the dataframe to only contain trackIds that are not equal to the bad_id(s).
I just keep getting nameErrors because I'm unsure of how to properly structure the loop and am in general not sure if this approach works anyways.