-1

We have daily 200-300 live projects which receive online traffic from our vendors across all our live projects. We store this data in two different DB (I know this sounds bad, but seemed good at the time of designing for various reason):

  • project information in - projects table of MySQL
  • traffic data for those projects in MongoDB collection.

Now, we have to generate analytical data out of these data that we store like on a given date-range - Revenue Earned, Profit, Top Vendors, Top Projects(with most conversion), etc... This data has to be stored daily somewhere so that we can query it by Date Range.

The only problem is that one project traffic is divided between multiple vendors. So this kind of makes it nested data because for a project we can have 10 vendors records.

What's the best way to store this kind of data so that I can query it easily for my representation purpose?

UPDATE

The Project Table contains details about project like -

Project_Code | Client Code | Sales Manager |Payout | Status | Country Code ...

The Traffics collection contains detail about the traffic on a particular project Like -

UUID | Project_Code | Vendor Code | Start Time | End Time | Conversion Status |...

When we have to pull project-wise stats we fetch all the projects and then sum up the traffic for corresponding project_code so that we can get

Project Code | Total Starts(sum of traffic for that project code) | Conversion (Sum of Traffic with conversion_status=1) | Incidence Length (Avg of Median(Endtime - Start time))

So this is what we have so far in our DB


Now what we want is to daily mine this data so that we can calculate

(Given Date Range - Like Last 7 Days, Today, This Month)

First View. Project Code | Revenue Earned (Conversion * Payout) | Cost to Company (sum(Vendor Conversion * Vendor Cost)) | Profit (Revenue - Cost)

Second View. First view but where Sales Manager = "John"

Third View. Top n vendors | Top n Projects | ...

Fourth View. Same as the Third View but where Sales Manager = "John"

So far we are using Laravel, Mysql, MongoDB, Redis as tech stack.

1
  • @DocBrown Thanks for the feedback, I've updated my question. Let me know if it is understandable now. Commented Feb 27, 2020 at 13:18

1 Answer 1

1

The standard approach for solving such problems is to create a special database for analytics (a.k.a OLAP database). Such a database is filled in regular intervals from the online production databases or other datasources, and optimized for the analytical queries one intends to apply.

The data model of such a database has usually the form of a star schema, with one fact table in the middle and several dimension tables around it. In your example, the fact table could be the "Traffics" table, with attributes "costs" and "revenue", and dimensions

  • Product

  • Vendor

  • Sales Manager

  • Time

Of course, you need to find a way to break down costs and revenues to the granularity of this fact table, and you have to think about how granular you need the time dimension, but that is "the exercise left to the reader".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.