0

In my problem I need to query a database and join the query results with a Kafka data stream in Flink. Currently this is done by storing the query results in a file and then use Flink's readFile functionality to create a DataStream of query results. What could be a better approach to bypass the intermediary step of writing to file and create a DataStream directly from query results?

My current understanding is that I would need to write a custom SourceFunction as suggested here. Is this the right and only way or are there any alternatives?

Are there any good resources for writing the custom SoruceFunctions or should I just look at current implementations for reference and customise them fro my needs?

1 Answer 1

2

One straightforward solution would be to use a lookup join, perhaps with caching enabled.

Other possible solutions include kafka connect, or using something like Debezium to mirror the database table into Flink. Here's an example: https://github.com/ververica/flink-sql-CDC.

Sign up to request clarification or add additional context in comments.

1 Comment

In addition to David's suggestion, which worked for me. I implemented something similar and indeed, I have used a CDC Source, which basically provides you a DataStream representing your rows (and changes over time) so after that you can create a stateful function that you can use to join the Kafka records with the CDC records. The only thing you need is to be able to key those by the same identifiers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.