How we can read data from relational database using custom data source. I am new to flink streaming. I am facing problem while adding new custom data-source. So please help me to add custom data source and read data continuously from source DB.
1 Answer
As suggested by Chengzhi, relational databases are not designed to be processed in a streaming fashion and it would be better to use Kafka, Kinesis or some other system for that.
However you could write a custom source function that uses a JDBC connection to fetch the data. It would have to continuously query the DB for any new data. The issue here is that you need a way to determine which data you have already read/processed and which you did not. From the top of my head you could use a couple of things, like remembering what was the last processed primary key, and use it in subsequent query like:
SELECT * FROM events WHERE event_id > $last_processed_event_id;
Alternatively you could clear the events table inside some transaction like:
SELECT * FROM unprocessed_events;
DELETE FROM unprocessed_events WHERE event_id IN $PROCESSED_EVENT_IDS;
event_id can be anything that lets you uniquely identify the records, maybe it could be some timestamp or a set of fields.
Another thing to consider is that you would have to manually take care of checkpointing (of the last_processed_even_id offset) if you want to provide any reasonable at-least-once or exactly-once guarantees.