1

I have an use case where I need to apply multiple functions to every incoming message, each producing 0 or more results.

Having a loop won't scale for me, and ideally I would like to be able to emit results as soon as they are ready instead of waiting for the all the functions to be applied.

I thought about using AsyncIO for this, maintaining a ThreadPool but if I am not mistaken I can only emit one record using this API, which is not a deal-breaker but I'd like to know if there are other options, like using a ThreadPool but in a Map/Process function so then I can send the results as they are ready.

Would this be an anti-pattern, or cause any problems in regards to checkpointing, at-least-once guarantees?

1 Answer 1

1

Depending on the number of different functions involved, one solution would be to fan each incoming message out to n operators, each applying one of the functions.


I fear you'll get into trouble if you try this with a multi-threaded map/process function.

How about this instead:

You could have something like a RichCoFlatMap (or KeyedCoProcessFunction, or BroadcastProcessFunction) that is aware of all of the currently active functions, and for each incoming event, emits n copies of it, each being enriched with info about a specific function to be performed. Following that can be an async i/o operator that has a ThreadPool, and it takes care of executing the functions and emitting results if and when they become available.

Sign up to request clarification or add additional context in comments.

2 Comments

There could be hundreds if not more, and this number can change during runtime.
Ok, I've offered another suggestion.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.