I need to find a way to efficiently process a large amount of data in PHP/MySQL. Here's the situation:
I have a database table with, say, one million records. Based on user input from PHP, I need to rank all of those one million records according to a non-trivial calculation step, so that I can pick the top scoring items. My problem is that this scales very poorly from a memory usage standpoint, particularly at the sorting step, if I rearrange the data into columns and use array_multisort.
Alternative methods I can think of are:
- Doing the calculations in PHP and reinserting the data with the scores into a temporary table, retrieving the highest scoring items using a SELECT ... ORDER BY score ... LIMIT query
- Doing the calculations in PHP and outputting the data with the scores into a CSV file, then calling the command-line sort utility, then reading in the top X number of lines
- Doing the calculations in MySQL using a stored procedure and retrieving the top X number of items, as in option 1. My concern with this is whether the DB is well suited for the number crunching this would involve
This has got to be a fairly common problem for things like search engines. Scalability is the number one priority, but performance has to be pretty good too. Is one of these approaches best, or is there some other great option that I'm not even considering?