I have a program that reads a large list of sequences from a file and does a calculation among all of the pairs in that list. It then stores all of these calculations into a hashset. When running this program about halfway through, I get a GC overhead limit error.
I realize this is because the garbage collector is using up 98% of the computation time and is unable to recover even 2% of the heap. Here is the code I have:
ArrayList<String> c = loadSequences("file.txt"); // Loads 60 char DNA sequences
HashSet<DNAPair,Double> LSA = new HashSet<DNAPair,Double>();
for(int i = 0; i < c.size(); i++) {
for(int j = i+1; j < c.size(); j++) {
LSA.put(new DNAPair(c.get(i),c.get(j)),localSeqAlignmentSimilarity(c.get(i),c.get(j)));
}
}
And here's the code for the actual method:
public static double localSeqAlignmentSimilarity(String s1, String s2) {
s1 = " " + s1;
s2 = " " + s2;
int max = 0,h = 0,maxI = 0,maxJ = 0;
int[][] score = new int[61][61];
int[][] pointers = new int[61][61];
for(int i = 1; i < s1.length(); i++) {
pointers[i][0] = 2;
}
for(int i = 1; i < s2.length(); i++) {
pointers[0][i] = 1;
}
boolean inGap = false;
for(int i = 1; i < s1.length(); i++) {
for(int j = 1; j < s2.length(); j++) {
h = -99;
if(score[i-1][j-1] + match(s1.charAt(i),s2.charAt(j)) > h) {
h = score[i-1][j-1] + match(s1.charAt(i),s2.charAt(j));
pointers[i][j] = 3;
inGap = false;
}
if(!inGap) {
if(score[i-1][j] + GAPPENALTY > h) {
h = score[i-1][j] + GAPPENALTY;
pointers[i][j] = 2;
inGap = true;
}
if(score[i][j-1] + GAPPENALTY > h) {
h = score[i][j-1] + GAPPENALTY;
pointers[i][j] = 1;
inGap = true;
}
} else {
if(score[i-1][j] + GAPEXTENSION > h) {
h = score[i-1][j] + GAPEXTENSION;
pointers[i][j] = 2;
inGap = true;
}
if(score[i][j-1] + GAPEXTENSION > h) {
h = score[i][j-1] + GAPEXTENSION;
pointers[i][j] = 1;
inGap = true;
}
}
if(0 > h) h = 0;
score[i][j] = h;
if(h >= max) {
max = h;
maxI = i;
maxJ = j;
}
}
}
double matches = 0;
String o1 = "", o2 = "";
while(!(maxI == 0 && maxJ == 0)) {
if(pointers[maxI][maxJ] == 3) {
o1 += s1.charAt(maxI);
o2 += s2.charAt(maxJ);
maxI--;
maxJ--;
} else if(pointers[maxI][maxJ] == 2) {
o1 += s1.charAt(maxI);
o2 += "_";
maxI--;
} else if(pointers[maxI][maxJ] == 1) {
o1 += "_";
o2 += s2.charAt(maxJ);
maxJ--;
}
}
StringBuilder a = new StringBuilder(o1);
b = new StringBuilder(o2);
o1 = a.reverse().toString();
o2 = b.reverse().toString();
a.setLength(0);
b.setLength(0);
for(int i = 0; i < Math.min(o1.length(), o2.length()); i++) {
if(o1.charAt(i) == o2.charAt(i)) matches++;
}
return matches/Math.min(o1.length(), o2.length());
}
I thought that this was because of all the variables I declare inside the method (the two int arrays and the stringbuilders etc.) creating more and more objects every time the method is run so I changed them all to static fields and cleared them everytime (ex. Arrays.fill(score,0);) instead of creating a new object.
However this didn't help at all and I still got the same error.
Could it be that the hashset that stores all of the calculations is getting too big and is unable to be stored by java? I'm not getting an out of heap space error so it seems kind of strange.
I also changed the command line argument to give more space to the JVM but that didn't seem to help.
Any insight on this problem would be helpful. Thanks!