2

I am running an ASP.NET MVC 3 web application and would like to gather statistics such as:

  • How often is a specific product viewed
  • Which search phrases typically return specific products in their result list
  • How often (for specific products) does a search result convert to a view

I would like to aggregate this data and break it down:

  • By product
  • By product by week
  • etc.

I'm wondering what are the cleanest and most efficient strategies for aggregating the data. I can think of a couple but I'm sure there are many more:

  • Insert the data into a staging table, then run a job to aggregate the data and push it into permanent tables.
  • Use a queuing system (MSMQ/Rhino/etc.) and create a service to aggregate this data before it ever gets pushed to the database.

My concerns are:

  • I would like to limit the number of moving parts.
  • I would like to reduce impact on the database. The fewer round trips and less extraneous data stored the better
  • In certain scenarios (not listed) I would like the data to be somewhat close to real-time (accurate to the hour may be appropriate)

Does anyone have real world experience with this and if so which approach would you suggest and what are the positives and negatives? If there is a better solution that I am not thinking of I'd love ot hear it...

Thanks

JP

3
  • Can you explain your reasoning for wanting to reduce impact on the database? If you are worried about affecting the main application database, why not just put it on a second sql server/database? Commented Sep 22, 2011 at 23:35
  • Gabe, viewing and searching for products will be the most frequent actions and hitting the database every time will likely cause a lot of locking and potential slowdown. Storing the data in a second database would not alleviate this problem because I would still have heavy reads and writes on the same table. Using the second database for aggregation purposes only and pushing the aggregated data back to the first DB on a fixed interval is indeed an option. Commented Sep 27, 2011 at 14:41
  • JP, I don't follow. The whole point of moving the aggregate data to a second database would be to prevent "locking and slowdown" when viewing/searching products in your main database - by moving all reads and writes of the aggregate data onto a second box. I can't understand why you think the aggregate data needs to be read from your primary server, and that only writes (and aggregation) should occur on the second server. Commented Sep 27, 2011 at 17:22

3 Answers 3

2
+50

I needed to do something similar in a recent project. We've implemented a full audit system in a secondary database, it tracks changes on every record on the live db. Essentially every insert, update and delete actually updates 2 records, one in the live db and one in the audit db.

Since we have this data in realtime on the audit db, we use this second database to fill any reports we might need. One of the tricks I've found when working with a reporting DB is to forget about normalisation. Just create a table for each report you want, and have it carry just the data you want for that report. Its duplicating data, but the performance gains are worth it.

As to filling the actual data in the reports, we use a mixture. Daily reports are generated by a scheduled task at around 3am, ditto for the weekly and monthly reports, normally over weekends or late at night.

Other reports are generated on demand, using mostly the data since the last daily, so its not that many records, once again all from the secondary database.

Sign up to request clarification or add additional context in comments.

Comments

1

I agree that you should create a separate database for your statistics, it will reduce the impact on your database.

You can go with your idea of having "Staging" tables and "Aggregate" tables; that way, if you want to access the near-real-time data you go o the staging table, when you want to historical data, you go to the aggregates.

Finally, I would recommend you use an asynchronous call to save your statistics; that way your pages will not have an impact in response time.

Comments

0

I suggest that you will create a separate database for this. The best way is to use BI technique. There is a separate services in SQL server for Bi.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.