Couldn't find any examples online but what I'm trying to do is basically use Java Spring Batch to read in a whole table in postgres and then for each row, publish that data elsewhere. I read https://spring.io/guides/gs/batch-processing/ but can't figure out how to do this. I also want to space out the data retrieval so my database doesn't get blocked up. There are a lot of examples reading from a csv file but can't find how to read from a Repository.
2 Answers
To read the table , you need to use one of Spring Batch provided readers - either use - org.springframework.batch.item.data.RepositoryItemReader or org.springframework.batch.item.database.JdbcPagingItemReader
Both readers implement pagination so your DB reading happens page by page & not whole table gets read all at once.
RepositoryItemReader has setPageSize(int pageSize) method and similar method is there in JdbcPagingItemReader too. There must be a column in your table on which ordering can be done to implement pagination.
Try to find code examples using these two readers.
These readers will read a page once , keep it in memory , and process single - single items till chunk size is reached & then commit happens. Next DB read wouldn't happen till one page is fully finished. Generally, for optimal performance , chunk size needs to be few times smaller than page size e.g. reader page size - 1000 & chunk size = 100 so 1000 items would be read once and committed in chunks of 100 - 100 items.
Next DB read happens when all of 1000 previous read have been passed to processor.
then for each row, publish that data elsewhere
To accomplish above, you will have to set chunk size to one and then in your writer , you can do whatever you wish and that way your transaction will be committed for each item.
2 Comments
Couldn't find any examples online
Have you seen the official samples here: https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples ?
There are many examples that show how to read data from a database:
https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples#hibernate-sample
https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples#football-job
https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples#trade-job
what I'm trying to do is basically use Java Spring Batch to read in a whole table in postgres and then for each row, publish that data elsewhere.
All the jobs in the previous samples have at least one step that reads data from a database and write it elsewhere.
I also want to space out the data retrieval so my database doesn't get blocked up
I would recommend using one of the paging item readers (see https://docs.spring.io/spring-batch/4.0.x/reference/html/readersAndWriters.html#pagingItemReaders) to read data in pages and not open a cursor on the whole table.