0

I have created a dataframe with a a column for codes and another for discrete values. I have a numpy array with some experimental values. How do i create a numpy array of codes that are associated to the nearest value defined in the dataframe.

Example of the operation

The dataframe that defines the mapping between the codes and values doesn't have to be a dataframe. I am much more familiar with pandas than numpy, so i tend to lean towards using pandas dataframes. I am very unfamiliar with numpy so not sure what the best way to do this might be.

This is what i have tried and it gives me the correct response. Its just too slow. My actual data set is 500x1500 and i have over 700 sets of data that this operation needs to be performed over, so efficiency and speed are paramount. Ideas? Thoughts? Suggestions? Thanks!

import numpy as np
import pandas as pd
from pandas import DataFrame


def main():
    npsize = (2,4)
    #Create an array of data between -0.75 and 0.25
    data = np.random.uniform(-0.75,0.25,npsize)
    
    #Pandas dataframe that creates a map
    codes = [np.array([1,2,3]),np.array([4,5,6]),np.array([7,8,9]),np.array([10,11,12]),np.array([13,14,15])]
    values = [-0.75,-0.5,-0.25,0,0.25]
    d = {'code':codes, 'value':values}
    data_map = pd.DataFrame(data=d)

    #I need to associate each element of data to the code within the data_map dataframe by looking up the nearest value
    #For example ... -0.05 ---> [10,11,12] 
    
    #Silly Looping approach ... surely there is a better/faster way to do this!
    mapped_data = np.zeros(shape=(2,4,3))
    xctr = 0
    yctr = 0

    while xctr < npsize[0]:
        #print(xctr)
        while yctr < npsize[1]:
            nearest_code = data_map.iloc[(data_map['value']-data[xctr,yctr]).abs().argsort()[:1]].code.iloc[0]
            mapped_data[xctr,yctr] = nearest_code
            yctr = yctr + 1
        yctr = 0
        xctr = xctr + 1

    print (mapped_data)

if __name__ == "__main__":
    main()
3
  • 1
    Use a modified searchsorted Commented Nov 2, 2022 at 19:40
  • Actually use merge_asof Commented Nov 2, 2022 at 21:15
  • For future reference, this question is better suited to codereview.stackexchange.com since you have an implementation that works but needs optimisation Commented Nov 2, 2022 at 22:41

1 Answer 1

1

Don't create codes as a list of arrays of manual values; use a reshaped arange.

Don't create values manually; also use arange.

Avoid holding inner lists; expand your "code" to multiple columns and then an additional dimension in your output array.

And yes, don't loop. Numpy doesn't have anything for this kind of merge but Pandas does - merge_asof. There are sorting requirements, and if you can avoid needing to preserve order after, your code will be faster.

import numpy as np
import pandas as pd
from numpy.random import default_rng


def main() -> None:
    # Pandas dataframe that creates a map
    code_width = 3
    codes = np.arange(1, 16).reshape((-1, code_width))
    d = {
        f'code_{i}': col
        for i, col in enumerate(codes.T)
    }
    data_map = pd.DataFrame(d, index=np.arange(-0.75, 0.5, 0.25))

    rand = default_rng(seed=0)
    givens = rand.uniform(-0.75, 0.25, (2, 4))
    givens_flat = givens.ravel()
    # Givens need to be sorted. If you don't need to preserve original order, then replace argsort with sort.
    order = givens_flat.argsort()
    givens_flat = givens_flat[order]

    merged_givens = pd.merge_asof(
        left=pd.Series(givens_flat, name='givens'), right=data_map,
        left_on='givens', right_index=True, direction='nearest',
    )

    mapped_givens = np.empty((code_width, len(givens_flat)))
    mapped_givens[:, order] = merged_givens.iloc[:, 1:].T
    mapped_givens = mapped_givens.reshape((-1, *givens.shape))

    print('Map:')
    print(data_map)
    print()

    print('Givens:')
    print(givens)
    print()

    print('Mapped:')
    print(mapped_givens)


if __name__ == "__main__":
    main()
Map:
       code_0  code_1  code_2
-0.75       1       2       3
-0.50       4       5       6
-0.25       7       8       9
 0.00      10      11      12
 0.25      13      14      15

Givens:
[[-0.11303831 -0.48021329 -0.70902648 -0.73347236]
 [ 0.06327024  0.16275558 -0.14336422 -0.02050344]]

Mapped:
[[[10.  4.  1.  1.]
  [10. 13.  7. 10.]]

 [[11.  5.  2.  2.]
  [11. 14.  8. 11.]]

 [[12.  6.  3.  3.]
  [12. 15.  9. 12.]]]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.