0

So, the situation is:

I have two numpy 2d arrays/pandas dataframes (doesn't matter, what I will use).Each of them contains approximately 106 records.Each record is a row with 10 float numbers.

I need to replace each row in second array(dataframe) with row from the first table, which has the smallest MSE compared to it. I can easily do it with "for" loops, but it sounds horrifyingly slow. Is there nice and beautiful numpy/pandas solution I don't see?

P.S For example:

arr1: [[1,2,3],[4,5,6],[7,8,9]]

arr2:[[9,10,11],[3,2,1],[5,5,5]]

result should be:[[7,8,9],[1,2,3],[4,5,6]]

in this example there are 3 numbers in each record and 3 records total. I have 10 numbers in each record, and around 1000000 records total

2
  • can you give us some test data to work with and any attempts youve made? Commented Aug 10, 2020 at 3:20
  • @DerekEden There you go. Attempts...Well only straightforward solution with counting each MSE for each record on the second table. Commented Aug 10, 2020 at 3:38

1 Answer 1

1

Using a nearest neighbor method should work here, especially if you want to cut down on computation time.

I'll give a simple example using scikit-learn's NearestNeighbor class, though there are probably even more efficient ways to do this.

import numpy as np
from sklearn.neighbors import NearestNeighbors

# Example data
X = np.random.randint(1000, size=(10000, 10))
Y = np.random.randint(1000, size=(10000, 10))

def map_to_nearest(source, query):
    neighbors = NearestNeighbors().fit(source)
    indices = neighbors.kneighbors(query, 1, return_distance=False)
    return query[indices.ravel()]

result = map_to_nearest(X, Y)

I'd note that this is calculating euclidean distances, not MSE. This should be fine for finding the closest match, since MSE is the squared euclidean distance.

Sign up to request clarification or add additional context in comments.

1 Comment

Ok, very good solution for me. I expected less specific solution, to try different metrics later, but it would do for now

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.