Without numpy:
out = list(map(dict(zip({k:0 for k in original}.keys(), modified)).get, original))
>>> out
[1, 1, 1, 3, 3, 3, 8, 8, 8, 8]
Explanation
So why does it work?
Addendum: alternatives and performance
Here are a few other ways to achieve the same result, and how long they take.
def pure_py(om):
"""Pure Python"""
original, modified = om
return list(map(dict(zip({k: 0 for k in original}.keys(), modified)).get, original))
def py_with_pd_unique(om):
"""Using a dict for replacement, but using pd.unique() to get the unique values"""
original, modified = om
return list(map(dict(zip(pd.unique(original), modified)).get, original))
def np_select(om):
"""Using np.select and assuming inputs are np.array"""
original, modified = om
return np.select([original == v for v in pd.unique(original)], modified, original)
def vect_dict_get(om):
"""Using a vectorized dict.get()"""
original, modified = om
d = dict(zip(pd.unique(original), modified))
return np.apply_along_axis(np.vectorize(d.get), 0, original)
Then:
import perfplot
from math import isqrt
def setup(n):
original = np.random.randint(0, isqrt(n), n)
modified = np.arange(len(pd.unique(original)))
return original, modified
perfplot.show(
setup=setup,
n_range=[4 ** k for k in range(4, 11)],
kernels=[
pure_py,
py_with_pd_unique,
np_select,
vect_dict_get,
],
xlabel='len(original)',
)
Conclusion: py_with_pd_unique is the fastest through the range. For 1M elements in original, it is almost twice as fast as the rest:
o, m = setup(1_000_000)
%timeit pure_py((o, m))
# 209 ms ± 359 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit py_with_pd_unique((o, m))
# 108 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

np.unique, as it sorts the values. In this specific example, it works (originalis ordered), but in general it doesn't. Try with[2,2,2,10,10,10,0,0,0]and[1,3,8]. Also,np.unique()is relatively slow (because it sorts).1,2,3? consecutive integers? If you have arrays instead of lists, can you useoriginalas indices: `modified[original-1]'