3

I have an interesting problem but cannot find the relevant literature, so need algorithem suggestions or just the correct way to phrase the problem so I can find the literature.

I'm given a vector of floats of varying length coming from a 2D array. I know the orientation (i.e. which axis I'm looking in). I need to find the start index of the vector. Essentially, I'm cross-correlating a short vector with a long vector.

I have implemented a brute force approach but naturally it grows as O(n^2) where n is the dimension of the 2D array.

Q1) What is an efficient algorithem to tackel this problem?

Q2) How is this type of problem called (so I can find relevant papers)?

There is measurement error so it will never be an exact match.

Here the brute force approach, where I look for the minimum of the norm of two vectors:

import time
import numpy as np


def get_distance(a: np.ndarray, b: np.ndarray) -> float:
    """ Calculate the distance between two vectors. """
    return  np.linalg.norm(a - b)


def find_min_distance_subvector(array: np.ndarray, x: np.ndarray) -> tuple[int, float]:
    leng = len(x)
    min_index = 0
    min_distance = np.inf

    # Assuming we know the orientation of the vector
    for ii in range(0, len(array) - leng):
        # Extract a sub-vector of the same length as x, starting from index ii
        sub_vec = array[ii:ii + leng]
        dist = get_distance(sub_vec, x)
        if dist < min_distance:
            min_distance = dist
            min_index = ii
    return min_index, min_distance


def main():
    leng = 100
    size = 2000

    # Create the search map
    arr = np.random.random((size, size))

    # Pick a random sub-vector from the map
    index = np.random.randint(0, size - leng)
    x = arr[index:index + leng]

    start_time = time.time()
    min_index, min_metric = find_min_distance_subvector(arr, x)
    end_time = time.time()
    print(f"Minimum distance: {min_metric} at index {min_index}, correct index: {index}, "
          f"time taken: {end_time - start_time:.4f} seconds")


if __name__ == '__main__':
    main()

Thank you for your help

8
  • 2
    Can you clarify this with an example? Commented Jun 8 at 20:59
  • 1
    Could the ordering of the map help or will it actually, for all practical purposes, be random? Does the use case allow for building an index into the map that can be reused for futuew searches? Commented Jun 8 at 21:31
  • 2
    You say "vector" but it's two-dimensional, which I think is usually not called a vector... Commented Jun 8 at 21:39
  • 1
    I assume from the sample code that its actually finding a submatrix of leng rows on all (size) columns. Commented Jun 8 at 21:43
  • 1
    I believe I answered this question over here: math.stackexchange.com/questions/2578015/… Commented Jun 9 at 14:54

1 Answer 1

7

Since you know the axis you're working on, let us reduce the problem to a single dimension: you want to find a subsequence matching some needle query="abcd" (or the closest thing to a match) inside a big heystack: corpus="azmpqrfksuxbabcdrtwerl". In other words, in a single dimension this problem can be reduced to a string search problem, with all the known algorithms for solving it.

However, if you're not certain your needle is indeed in the heystack and you want to choose instead some minimum distance subsequence instead of returning a False value (as your code might suggest), you have a different problem on your hands and I'm not aware of better than naive solutions for it.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.