I'd like to implement a function in Scala/Spark, which can take multiple reducers/aggregators and execute them in a single pass. So basically I give the reduce functions and initial values and it should create a compound reduce operation in a single pass.
Here is what the logic would look like in Python
from functools import reduce
def reduce_at_once(data, reducer_funcs_inits):
reducer_funcs, inits = zip(*reducer_funcs_inits)
complete_reducer_func = lambda acc, y: tuple(rf(a_x, y) for a_x, rf in zip(acc, reducer_funcs))
return list(reduce(complete_reducer_func, data, inits))
data = list(range(1, 20))
reducer_funcs_inits = [(lambda acc, y: acc + y, 0), # sum
(lambda acc, y: acc * y, 1) # product
]
print(list(reduce_at_once(data, reducer_funcs_inits)))
# [190, 121645100408832000]
How can I do something like this in Scala (Spark)? The issue seems that I have a list whose length I only know when calling, but also the elements of the list may have different types (reduce initial accumulator) depending on which reducer I want to include (not necessarily only numbers like here).
pythonlabel also