0

I am extracting line by line a CSV file containing more than 7M lines occupuying more than 1Gig on disk space.

The reading operation into a List<String> is fine and happens in less than 2 minutes. But the problem is when I try to loop on this list to and map each line to an object Balance then I created I get an OuyOfMemoryException:

01:00:30.664 [restartedMain] ERROR org.springframework.batch.core.step.AbstractStep - Encountered an error executing step readInputStep in job readCsvJob
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:68) ~[?:1.8.0_172]
    at java.lang.StringBuffer.<init>(StringBuffer.java:128) ~[?:1.8.0_172]
    at java.text.DigitList.getStringBuffer(DigitList.java:804) ~[?:1.8.0_172]
    at java.text.DigitList.getDouble(DigitList.java:164) ~[?:1.8.0_172]
    at java.text.DecimalFormat.parse(DecimalFormat.java:2089) ~[?:1.8.0_172]
    at java.text.NumberFormat.parse(NumberFormat.java:383) ~[?:1.8.0_172]
    at fr.payet.flad.batch.mapper.BalanceLineMapper.parseToDouble(BalanceLineMapper.java:56) ~[classes/:?]
    at fr.payet.flad.batch.mapper.BalanceLineMapper.toBalance(BalanceLineMapper.java:40) ~[classes/:?]
    at fr.payet.flad.batch.tasklet.ReadInputTasklet.execute(ReadInputTasklet.java:56) ~[classes/:?]

Here is my BalanceLineMapper code :

@Component
@Slf4j
public class BalanceLineMapper {

    public Balance toBalance(String[] ligneCsv, int cursorIndex) {
        try {
            return Balance.builder()
                    .index(cursorIndex)
                    .exer(ligneCsv[0])
                    .ident(ligneCsv[1])
                    .nDept(ligneCsv[2])
                    .lBudg(ligneCsv[3])
                    .insee(ligneCsv[4])
                    .siren(ligneCsv[5])
                    .cRegi(ligneCsv[6])
                    .nomen(ligneCsv[7])
                    .cType(ligneCsv[8])
                    .cstyp(ligneCsv[9])
                    .cActi(ligneCsv[10])
                    .finess(ligneCsv[11])
                    .secteur(ligneCsv[12])
                    .cBudg(ligneCsv[13])
                    .codBud1(ligneCsv[14])
                    .compte(ligneCsv[15])
                    .BEDeb(ligneCsv[16])
                    .BECre(parseToDouble(ligneCsv[17]))
                    .OBNetDeb(parseToDouble(ligneCsv[18]))
                    .OBNetCre(parseToDouble(ligneCsv[19]))
                    .ONBDeb(parseToDouble(ligneCsv[20]))
                    .ONBCre(parseToDouble(ligneCsv[21]))
                    .OOBDeb(parseToDouble(ligneCsv[22]))
                    .OOBCre(parseToDouble(ligneCsv[23]))
                    .sd(parseToDouble(ligneCsv[24]))
                    .sc(parseToDouble(ligneCsv[25]))
                    .build();
        } catch (NumberFormatException e) {
            log.debug("Erreur lors de du casting");
        }
        return null;
    }

    private Double parseToDouble(String number){
        NumberFormat format = NumberFormat.getInstance(Locale.FRANCE);
        try {
             return format.parse(number).doubleValue();
        }catch (ParseException e){
            log.error("Erreur de parsing de {} en Java Double", number, e.getMessage(), e);
        }
        log.error("parseToDouble retourne la valeur NULL");
        return null;
    }

}

and ReadInputTasklet code :

@Slf4j
@Component
public class ReadInputTasklet implements Tasklet, StepExecutionListener {

    @Autowired
    BalanceLineMapper balanceLineMapper;

    @Override
    public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception {
        List<Balance> balances = Lists.newArrayList();
        List<String> balancesList = Lists.newArrayList();
        try {
            CSVReader reader = new CSVReader(new FileReader("/Users/ghassen/Desktop/FLAD/Balance_Commune_2016.csv"), '\n');
            String[] nextLine;
            int cursorIndex = 0;
            while ((nextLine = reader.readNext()) != null) {
                if (cursorIndex != 0){
                    balancesList.add(nextLine[0]);
                    log.debug("{} balance(s) ajoutée(s) dans la liste ...", balancesList.size());
                }
                cursorIndex++;
            }
            log.debug("Lecture de toutes les lignes terminé");

            log.debug("Parsing de toutes les lignes");
            for (String line : balancesList){
                String[] lineSeperated = StringUtils.splitByWholeSeparatorPreserveAllTokens(line,";");
                balances.add(balanceLineMapper.toBalance(lineSeperated, cursorIndex));
            }
            log.debug("Job terminé");
        } catch (IOException e) {
            log.error("File not found", e);
        }
        return RepeatStatus.FINISHED;
    }

    @Override
    public void beforeStep(StepExecution stepExecution) {

    }

    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        return null;
    }
}
1
  • I do not know Spring, but you may have to increase its heap settings. -Xmx would be the related command line argument for a vanilla JVM, e.g. -Xmx6G would set the upper limit to 6 gigabytes. Perhaps this way: stackoverflow.com/questions/23072187/… Commented Jul 5, 2018 at 23:28

2 Answers 2

1

I agree with @AUser. However, let me be more specific. You can replace your function of parseToDouble with the standard Double.valueOf(). It should be much more efficient.

Sign up to request clarification or add additional context in comments.

1 Comment

you're right .. I removed the methodDouble parseToDouble(String number) and now it's fine :)
1

You are creating tons of instances (including the strings, which you are parsing later) in a short time, in which the garbage collector can't keep up. I recommend you to build the whole system in a stream design and to only parse the ones that you actually will need.

1 Comment

My previous solution was to parse directly the read line and add it to List<Balance>. And I need all of them because I will persist them in the database after that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.