0

I'm currently trying to read a csv file and want to create, for example a user for each row.

Currently I have this import method:

@Override
@Transactional
public void importUserData(final UserDataDTO toBeSavedDTO) {
    final String file = decodeFile(toBeSavedDTO); // Base64 decoding

    final List<UserDTO> imports = parser.parseImport(file);

    if (imports == null || imports.isEmpty()) {
        throw new Exception();
    }

    // delete existing entries
    dao.deleteUserEntries();

    imports.foreach(dao::createUser);
}

Here I decode that file and parse it to a list of UserDTO. Then I delete the DB and create an entry for each user. It basically does a persist. This work but it takes a lot of time and is not performant.

Is there a way to do a batch insert or something like that? Maybe with a flush after every batch?

2
  • Is you DAO layer doing a commit after every user create? If so, disable that and do a "begin transaction" <insert><insert><insert>.... "commit" and it should speed it up and give you th "every batch flush" you're looking for. Since you don't say anyting more about your DAO I can't be more specific. Commented Jun 23 at 14:36
  • This question is similar to: Hibernate batch insert. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. Commented Jun 24 at 5:29

1 Answer 1

1

Bulk inserts are significantly more performant than inserting each record individually. Spring + JPA (or Hibernate) doesn’t do batch inserts out of the box unless explicitly configured, but with a few optimizations, you can improve performance dramatically.

In application.properties or application.yml, configure Hibernate to use batch inserts:

spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.generate_statistics=true

This tells Hibernate to group insert statements in batches of 50 (or your preferred size).

Modify your loop to flush and clear the EntityManager every N records to avoid memory bloat and allow batching to trigger:

@Autowired
private EntityManager entityManager;

@Override
@Transactional
public void importUserData(final UserDataDTO toBeSavedDTO) {
    final String file = decodeFile(toBeSavedDTO); // Base64 decoding

    final List<UserDTO> imports = parser.parseImport(file);

    if (imports == null || imports.isEmpty()) {
        throw new RuntimeException("No user data to import");
    }

    dao.deleteUserEntries(); // Clear the table first

    int batchSize = 50;
    int count = 0;

    for (UserDTO dto : imports) {
        dao.createUser(dto);  // Persists a User entity
        count++;

        if (count % batchSize == 0) {
            entityManager.flush();
            entityManager.clear(); // Detach entities from persistence context
        }
    }

    entityManager.flush(); // flush remaining
    entityManager.clear();
}

Make sure dao.createUser() does not call save() on a CrudRepository which breaks batching. Instead, persist using EntityManager.persist(entity).

public void createUser(UserDTO dto) {
    User entity = mapDtoToEntity(dto);
    entityManager.persist(entity); // This supports batching
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.