Creating a Flink DataStream from database query results

Question

In my problem I need to query a database and join the query results with a Kafka data stream in Flink. Currently this is done by storing the query results in a file and then use Flink's readFile functionality to create a DataStream of query results. What could be a better approach to bypass the intermediary step of writing to file and create a DataStream directly from query results?

My current understanding is that I would need to write a custom SourceFunction as suggested here. Is this the right and only way or are there any alternatives?

Are there any good resources for writing the custom SoruceFunctions or should I just look at current implementations for reference and customise them fro my needs?

David Anderson · Accepted Answer · 2022-03-17 15:01:01Z

2

One straightforward solution would be to use a lookup join, perhaps with caching enabled.

Other possible solutions include kafka connect, or using something like Debezium to mirror the database table into Flink. Here's an example: https://github.com/ververica/flink-sql-CDC.

answered Mar 17, 2022 at 15:01

David Anderson

44.3k4 gold badges41 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dan Serb Over a year ago

In addition to David's suggestion, which worked for me. I implemented something similar and indeed, I have used a CDC Source, which basically provides you a DataStream representing your rows (and changes over time) so after that you can create a stateful function that you can use to join the Kafka records with the CDC records. The only thing you need is to be able to key those by the same identifiers.

Collectives™ on Stack Overflow

Creating a Flink DataStream from database query results

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related