I have a 1-dimension numerical dataset (but my question also applies for a n-dimension numerical dataset) which I want to cluster, and I already know the values of the cluster centers. So I only want to map each data point to its associed cluster center (the one which is the closest of the datapoint).
I could write an ad hoc function, but I would really prefer using a Python scientific library optimised to work on pandas.Series or numpy.arrays, as Scipy, because my dataset is very big (hundreds of millions of data points).
How can I do this?
Thank you!