Peak and valley finding algorithm

Question

I'm trying to improve my algorithm to make it faster. Also might help to have a few sets of eyes for any bugs or gaps in logic I didn't catch. One thing to note: because of the nature of the algorithm, there will be periods at the beginning and end of the data that are dead zones, where a peak/valley will not be detected.

The jist of the algorithm is to step through overlapping windows of the data, do some linear regression and pick out points by standard deviation. Because of the overlapping feature we can decide how many "angles" a point needs to be detected as a peak for it to be valid. Then we go and only take the highest peak from a group of consecutive peaks.

I have highly stationary data; what I may or may not want to detect as a peak varies, and I wanted the most robust algorithm so I can use something like Bayesian optimization to set the best parameters.

This is more or less how I'd like my algorithm to function for my specific use case; I'm not really looking for feedback on that, more on whether my code translates these goals correctly and if it's the most computationally efficient way to do things.

def get_peak_valley(arr, threshold, window_size, overlap, req_angles):
    # validate params
    window_size = int(round(window_size))
    req_angles = int(round(req_angles))
    window_step = int(round(window_size * (1 - overlap)))
    if window_step == 0:
        window_step = 1
    if req_angles == 0:
        req_angles = 1

    # get all points that classify as a peak/valley
    ind = 0
    peak_inds, valley_inds = [], []

    while ind + window_size <= len(arr):
        flattened = detrend(arr[ind:ind + window_size])
        std, avg = np.std(flattened), np.mean(flattened)
        lower_b = avg - std * threshold
        upper_b = avg + std * threshold
        for idx, val in enumerate(flattened):
            if val < lower_b:
                valley_inds.append(idx + ind)
            elif val > upper_b:
                peak_inds.append(idx + ind)
        ind += window_step

    # discard points that have counts below the threshold
    peak_counts = Counter(peak_inds)
    pk_inds = [c for c in peak_counts.keys() if peak_counts[c] >= req_angles]

    valley_counts = Counter(valley_inds)
    vly_inds = [c for c in valley_counts.keys() if valley_counts[c] >= req_angles]

    # initialize iterator to find to best peak/valley for consecutive detections
    if len(pk_inds) == 0 or len(vly_inds) == 0:
        return pk_inds, vly_inds

    if pk_inds[0] < vly_inds[0]:
        curr_event = 'peak'
        best_val = arr[pk_inds[0]]
    else:
        curr_event = 'valley'
        best_val = arr[vly_inds[0]]

    #iterate through points and only carry forward the index that has the highest or lowest value from the current group
    best_ind, new_vly_inds, new_pk_inds = 0, [], []

    event_inds = sorted(pk_inds + vly_inds)

    for x in event_inds:
        if x in pk_inds:
            is_peak = True
        else:
            is_peak = False

        if is_peak and curr_event == 'valley':
            new_vly_inds.append(best_ind)
            curr_event = 'peak'
            best_val = arr[x]
            best_ind = x
            continue
        if not is_peak and curr_event == 'peak':
            new_pk_inds.append(best_ind)
            curr_event = 'valley'
            best_val = arr[x]
            best_ind = x
            continue

        if is_peak and curr_event == 'peak' and arr[x] > best_val:
            best_val = arr[x]
            best_ind = x
        elif not is_peak and curr_event == 'valley' and arr[x] < best_val:
            best_val = arr[x]
            best_ind = x

    if curr_event == 'valley':
        new_vly_inds.append(best_ind)
    if curr_event == 'peak':
        new_pk_inds.append(best_ind)

    return new_pk_inds, new_vly_inds

Greetings! And Welcome! This question might get closed because there is not enough context. Can you help us by (1) providing imports (2) providing a sample input and output — konijn
– konijn, Commented Oct 31, 2019 at 11:41

RomanPerekhrest · Accepted Answer · 2019-10-25 16:36:22Z

An extended and more comprehensive refactoring would require having more context to your current function: calling context, how do the function's parameters are composed and related, what is close_prices ...

But even with restricted context the posted function get_peak_valley has enough space (gaps) for restructuring and optimizations:

Rounding numeric arguments and ensuring limits:

round function. Python's round function already returns rounded number as integer if precision is omitted, no need to cast to int (like int(round(window_size)) ...). Negative numbers will be rounded to 0.
ensuring lower limit with
```
if window_step == 0:
    window_step = 1
if req_angles == 0:
    req_angles = 1
```
can be replaced with convenient max function call, like req_angles = max(round(req_angles), 1)

Primary "peak" and "value" indices:

peak_inds, valley_inds = [], []. Instead of declaring and accumulating separate lists which then will be feed to Counter - we can define and accumulate them as counters at once peak_counts, valley_counts = Counter(), Counter()

Substitute Algorithm (Substitute Algorithm) for "stepping through overlapping windows":

the initial looping scheme:

ind = 0
while ind + window_size <= len(arr):
    flattened = detrend(arr[ind:ind + window_size])
    ...
    ind += window_step

has more flexible equivalent:

arr_size = len(arr)    # getting list size at once
for i in range(0, arr_size, window_step):
    flattened = detrend(arr[ind:ind + window_size])
    ...

range function has a convenient step option.

New conception for Event and EventState:
The proposed OOP approach is not primarily for performance, but for obtaining well-organized, structured and flexible code.
Instead of going to a mess of conditionals and switching between 2 exclusive events, we'll present an event as a IntEnum enumeration class with 2 values 0 and 1. That will allow us to easily switch/swap to opposite event using bitwise XOR operator (Bitwise XOR sets the bits in the result to 1 if either, but not both, of the corresponding bits in the two operands is 1)

class Event(IntEnum):
    PEAK = 0
    VALLEY = 1

The class EventState represents the current event state with related best index, best price and the lists of best indices for each event type implemented conveniently as self._event_indices = {Event.PEAK: [], Event.VALLEY: []}. See the full definition in below code section.

Trasersing through combined event_inds (last for loop):

The first condition for setting is_peak flag should be simplified to the following in_peak = x in pk_inds. But, going further, we'll place a temp set peak_ids_set = set(pk_inds) before starting loop, for faster containment check on each loop iteration.
Then, changing the flag to is_peak = x in peak_ids_set.
All seems good, but the flag is better with name in_peak (current item is contained in peak indices).
That would support in solving next concern which is multiple conditions like curr_event == 'valley' and curr_event == 'peak'.
The beneficial way is to apply Extract variable technique and extract those conditions at once as:

in_peak = x in peak_ids_set
is_peak = event_state.event == Event.PEAK
is_valley = event_state.event == Event.VALLEY

Two conditionals that have continue action are now collapsed into one due to flexible EventState behavior which allows to switch to next event (event_state.switch method), set best rate (index, price) with event_state.set_best_rate method and add best index to encapsulated internal lists for the needed event type (event_state.add_best_index).

The final resulting best indices are covered and returned by event_state.event_inices property.

The final version:

from collections import Counter
from scipy.signal import detrend
import numpy as np
from enum import IntEnum


class Event(IntEnum):
    PEAK = 0
    VALLEY = 1


class EventState:
    def __init__(self, event, close_prices, best_price, best_idx=0):
        self.event = event
        self._close_prices = close_prices
        self.best_price = best_price
        self.best_idx = best_idx
        self._event_indices = {Event.PEAK: [], Event.VALLEY: []}

    @property
    def event_inices(self):
        return tuple(self._event_indices.values())

    def switch(self):
        """Switch event state (name)"""
        self.event ^= 1

    def set_best_rate(self, best_idx):
        self.best_price = self._close_prices[best_idx]
        self.best_idx = best_idx

    def add_best_index(self):
        self._event_indices[self.event].append(self.best_idx)


def get_peak_valley(arr, threshold, window_size, overlap, req_angles):
    # validate params
    window_size = round(window_size)
    req_angles = max(round(req_angles), 1)
    window_step = max(round(window_size * (1 - overlap)), 1)

    # get all points that classify as a peak/valley
    peak_counts, valley_counts = Counter(), Counter()
    arr_size = len(arr)

    for i in range(0, arr_size, window_step):
        flattened = detrend(arr[i:i + window_size])
        std, avg = np.std(flattened), np.mean(flattened)
        lower_b = avg - std * threshold
        upper_b = avg + std * threshold

        for idx, val in enumerate(flattened):
            if val < lower_b:
                valley_counts[idx + i] += 1
            elif val > upper_b:
                peak_counts[idx + i] += 1

    # discard points that have counts below the threshold
    pk_inds = [i for i, c in peak_counts.items() if c >= req_angles]
    vly_inds = [i for i, c in valley_counts.items() if c >= req_angles]

    # initialize iterator to find to best peak/valley for consecutive detections
    if len(pk_inds) == 0 or len(vly_inds) == 0:
        return pk_inds, vly_inds

    if pk_inds[0] < vly_inds[0]:
        curr_event, best_price = Event.PEAK, close_prices[pk_inds[0]]
    else:
        curr_event, best_price = Event.VALLEY, close_prices[vly_inds[0]]

    event_state = EventState(curr_event, close_prices=close_prices, best_price=best_price)
    event_inds = sorted(pk_inds + vly_inds)
    peak_ids_set = set(pk_inds)

    # iterate through points and only carry forward the index
    # that has the highest or lowest value from the current group
    for x in event_inds:
        in_peak = x in peak_ids_set
        is_peak = event_state.event == Event.PEAK
        is_valley = event_state.event == Event.VALLEY

        if (in_peak and is_valley) or (not in_peak and is_peak):
            event_state.add_best_index()
            event_state.switch()
            event_state.set_best_rate(best_idx=x)
            continue

        if (in_peak and is_peak and close_prices[x] > event_state.best_price) or \
                (not in_peak and is_valley and close_prices[x] < event_state.best_price):
            event_state.set_best_rate(x)

    event_state.add_best_index()

    return event_state.event_inices

Look pretty good so far. I'll have to do some more testing though. Thanks! — learningthemachine
– learningthemachine, Commented Oct 25, 2019 at 16:13
@learningthemachine, please also consider the last edit (consolidate condition, the last if condition at the end if (...) or (...): event_state.set_best_rate(x)) — RomanPerekhrest
– RomanPerekhrest, Commented Oct 25, 2019 at 16:37

Reinderien · Accepted Answer · 2019-10-25 01:42:49Z

Validation

# validate params
window_size = int(round(window_size))
req_angles = int(round(req_angles))
window_step = int(round(window_size * (1 - overlap)))

This does... some validation, but not a lot. It will validate that the arguments are numeric, but nothing else. If you cared about validation, you should probably also check ranges, especially that the window size and step are non-negative, etc.

Numpy vectorized conditionals

This:

    for idx, val in enumerate(flattened):
        if val < lower_b:
            valley_inds.append(idx + ind)
        elif val > upper_b:
            peak_inds.append(idx + ind)

should not use a loop. Read about vectorized conditionals here: https://stackoverflow.com/questions/45768262/numpy-equivalent-of-if-else-without-loop#45768290

Stricter events

    curr_event = 'peak'

shouldn't use a string. Use enum.Enum instead and include event values for PEAK and VALLEY.

Direct booleans

    if x in pk_inds:
        is_peak = True
    else:
        is_peak = False

should be

is_peak = x in pk_inds

Stack Exchange Network

Peak and valley finding algorithm

2 Answers 2

Validation

Numpy vectorized conditionals

Stricter events

Direct booleans

You must log in to answer this question.

Hot Network Questions

Peak and valley finding algorithm

2 Answers 2

Validation

Numpy vectorized conditionals

Stricter events

Direct booleans

You must log in to answer this question.

Related

Hot Network Questions