0

I'm using java configuration (spring-boot) for spring batch. I have a list of Employee Ids and for each Id, I need to run a query (like below) and then process the data.

select * from history where employee_id = ?

I understand we can use reader.setPreparedStatementSetter to dynamically set the parameter in the above SQL. However, I'm not sure how I can repeat the batch process for each of the employee id in the list. Even if I mark the reader() as @StepScope, the reader is called only once. (i.e.), the batch runs only once. Any help is appreciated.

    List employeeIds = new ArrayList();
    employeeIds.add(1);
    employeeIds.add(2);

    @Bean
    @StepScope
    public ItemReader<History> reader() {
        JdbcCursorItemReader<History> databaseReader = new JdbcCursorItemReader<>();
        databaseReader.setSql("select * from history where employee_id = ?");
        databaseReader.setPreparedStatementSetter(..);
        ....

        return databaseReader;
    }


    @Bean
    public Step step(StepBuilder stepBuilder){
        return stepBuilderFactory.get("sample").
                .reader(reader())
                .processor(processor())
                .writer(writer())
                .build();
    }
3

1 Answer 1

3

So first and foremost, I don't recommend you do this. In theory, and almost always in practice, opening a single cursor is FAR more efficient than querying separately for each time.

The "better" way would be to insert the ID list into a staging/driving table (typically in an earlier step, whenever you get that list of IDs from the original source) and then change your query to be something like:

select * from history where employee_id in (select id from driving_table)

Alternatively, you could, at a bare minimum, at least change your query to:

select * from history where employee_id in ( ? ) 

and pass in a list of IDs to it (be careful here as some databases limit the # of parameters in a query). If your list threatens to exceed that limit, you'd need to open a new cursor, effectively paginating over the list.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. The list contains about 200000 records - so "IN" wouldn't work. I could use JOIN instead.
Absolutely do the join. It will perform far better than pulling the data back just to query with it again and will reduce unnecessary I/O.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.