I use PHP to do a lot of data processing ( realizing I'm probably pushing into territories where I should be using other languages and/or techniques ).
I'm doing entity extraction with a PHP process that loads an array containing ngrams to look for into memory. That array uses 3GB of memory and takes about 20 seconds to load each time I launch a process. I generate it once locally on the machine and each process loads it from a .json file. Each process then tokenizes the text it's processing and does an array_intersect between these two arrays to extract entities.
Is there any way to preload this into memory on the machine that is running all these processes and then share the resource across all the processes?
Since it's probably not possible with PHP: What type of languages/methods should I be researching to do this sort of entity extraction more efficiently?