3

I need to build a pipeline with different column aggregations from a same source, say, one by userId, another by productId, and etc. I also want to have different granularity aggregations, say, by hourly, daily. Each aggregation will have a different sink, say a different nosql table.

It seems simple to build a SQL query with Table API. But I would like to reduce the operation overhead of managing too many Flink apps. So I am thinking putting all different SQL queries in one pyflink app.

This is first time I build Flink app. So I am not sure how feasible this is. In particular, I'd like to know:

  • Read the Flink doc, I see there are concepts of application vs job. So I am curious if each SQL aggregation query is a single Flink job?
  • will the overall performance degraded because of too many queries in one Flink app?
  • since the queries share a same source(from kinesis), will each query get a copy of the source. Basically, I want to make sure each event will be processed by each sql aggregation query.

Thanks!

1 Answer 1

3

You can put multiple queries into one job if you use a statement set: https://docs.ververica.com/user_guide/sql_development/sql_scripts.html

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the link. Btw, do you know what would happen to the source events if there are multiple aggregation pipelines. Say it is a kinesis source, I am concerned that if an event has being processed by one aggregation, then the other aggregation may miss it as they share a same Kinesis consumer ? Or the event will be copied inside Flink for each aggregation pipeline? Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.