I am currently developing an application that processes several files, containing around 75,000 records a piece (stored in binary format). When this app is ran (manually, about once a month), about 1 million records are contained entirely with the files. Files are put in a folder, click process and it goes and stores this into a MySQL database (table_1)
The records contain information that needs to be compared to another table (table_2) containing over 700k records.
I have gone about this a few ways:
METHOD 1: Import Now, Process Later
In this method, I would import the data into the database without any processing from the other table. However when I wanted to run a report on the collected data, it would crash assuming memory leak (1 GB used in total before crash).
METHOD 2: Import Now, Use MySQL to Process
This was what I would like to do but in practice it didn't seem to turn out so well. In this I would write the logic in finding the correlations between table_1 and table_2. However the MySQL result is massive and I couldn't get a consistent output, sometimes causing MySQL giving up.
METHOD 3: Import Now, Process Now
I am currently trying this method and although the memory leak is subtle, It still only gets to about 200,000 records before crashing. I have tried numerous forced garbage collections along the way, destroying properly classes, etc. It seems something is fighting me.
I am at my wits end trying to solve the issue with memory leaking / the app crashing. I am no expert in Java and have yet to really deal with very large amounts of data in MySQL. Any guidance would be extremely helpful. I have put thought into these methods:
- Break each line process into individual class, hopefully expunging any memory usage on each line
- Some sort of stored routine where once a line is stored into the database, MySQL does the table_1 <=> table_2 computation and stores the result
But I would like to pose the question to the many skilled Stack Overflow members to learn properly how this should be handled.