I have a 60 MB text file through which my program searches for a specific ID and extract some related text. And I have to repeat the process for 200+ IDs. Initially, I used a loop to cycle through the lines of the file and look for the ID and then extract the related text but it takes way too long(~2 min). So instead, now I am looking at way to load the entire file into memory, then search for my IDs and associated text from there; I imagine that should be faster than accessing the hard drive 200+ times. So I wrote the following code to load the file into memory:
public String createLocalFile(String path)
{
String text = "";
try
{
FileReader fileReader = new FileReader( path );
BufferedReader reader = new BufferedReader( fileReader );
String currentLine = "";
while( (currentLine = reader.readLine() ) != null )
{
text += currentLine;
System.out.println( currentLine );
}
}
catch(IOException ex)
{
System.out.println(ex.getMessage());
}
return text;
}
Unfortunately, saving the file's text into a String variable takes an extremely long time. How can I load the file faster ? Or is there a better way to accomplish the same task ? Thanks for any help.
Edit: Here is the link to the file https://github.com/MVZSEQ/denovoTranscriptomeMarkerDevelopment/blob/master/Homo_sapiens.GRCh38.pep.all.fa
Typical line looks like:
>ENSP00000471873 pep:putative chromosome:GRCh38:19:49496434:49499689:1 gene:ENSG00000142534 transcript:ENST00000594493 gene_biotype:protein_coding transcript_biotype:protein_coding\
MKMQRTIVIRRDYLHYIRKYNRFEKRHKNMSVHLSPCFRDVQIGDIVTVGECRPLSKTVR\
FNVLKVTKAAGTKKQFQKF\
Where ENSP00000471873 is the ID and the text I would be extracting is
MKMQRTIVIRRDYLHYIRKYNRFEKRHKNMSVHLSPCFRDVQIGDIVTVGECRPLSKTVR\
FNVLKVTKAAGTKKQFQKF\
StringBuilderinstead of string concatenation (may be that the compiler is already converting your code to use it).