4

I have been tasked with creating a web page which queries a table which has 3 million rows. I don't want to print all of the rows, but instead to create a report which lists the top 10 items which appear in the same basket as another item.

I haven't ever used php to deal with this amount of data before so my question is not regarding the actual coding to make the calculations, but what are the key considerations I should take with regards to Should php or sql do the bulk of the calculation, The page will not be updating any records but only making calculations based on their values and frequency How expensive is this in terms of timeout concerns, hardware. Etc

The page won't be loaded at a huge volume, I'd imagine less than 1000 a day.

2
  • It depends whether you want real time data or update data in batch after particular interval for calculations. Yes, for retrieving data you must use pagination and indexing data on which you are filtering data. The rest would work fine. Commented Dec 5, 2015 at 15:02
  • 1
    You really don't want to be resolving this query against the base data every time you need to run it. Use a batch or on demand cache to populate a new table (properly indexed) representing the combinations. Commented Jul 5, 2017 at 16:12

1 Answer 1

2

I think what you need to pay attention to are

MySQL: proper index created for column you need to do where or join , use explain to see if query are using the right index . You may also want to look at partitions table but for 3 millon I do not think it is necessary unless your data is keep growing fast.

PHP: make sure you are not loading too much data to memory , you can use xdebug to dump memory usage file to check if you have slow down problem.

Sign up to request clarification or add additional context in comments.

3 Comments

on not loading too much data to memory, if I found myself in a situation where I was, would breaking the operation down into steps, printing and then returning, next portion of data be the way to resolve this?
Yes then you need query the data by limit and break it down to different trunk
If someone is trying to resolve the associations between 3 milion records in PHP's runtime memory, then their script deserves to die.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.