1

How we can read data from relational database using custom data source. I am new to flink streaming. I am facing problem while adding new custom data-source. So please help me to add custom data source and read data continuously from source DB.

1
  • pulling directly from relational database doesn't do streaming, though you can still do it like a batch process. Do you have a message queuing like Kafka or Kinesis you can pull data from? Commented Jan 9, 2018 at 16:24

1 Answer 1

4

As suggested by Chengzhi, relational databases are not designed to be processed in a streaming fashion and it would be better to use Kafka, Kinesis or some other system for that.

However you could write a custom source function that uses a JDBC connection to fetch the data. It would have to continuously query the DB for any new data. The issue here is that you need a way to determine which data you have already read/processed and which you did not. From the top of my head you could use a couple of things, like remembering what was the last processed primary key, and use it in subsequent query like:

SELECT * FROM events WHERE event_id > $last_processed_event_id;

Alternatively you could clear the events table inside some transaction like:

SELECT * FROM unprocessed_events; DELETE FROM unprocessed_events WHERE event_id IN $PROCESSED_EVENT_IDS;

event_id can be anything that lets you uniquely identify the records, maybe it could be some timestamp or a set of fields.

Another thing to consider is that you would have to manually take care of checkpointing (of the last_processed_even_id offset) if you want to provide any reasonable at-least-once or exactly-once guarantees.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.