Realtime database data structure modeling

Question

We have a chat system for which we have a analytics dashboard. currently we are showing the top said sentences. The model looks like below:


messages
    --key1
       -text: "who are you"
    --key2
       -text: "hello"
    --key3
       -text: "who are you"

there is a database trigger that every time a new message gets inserted store a count like below


stat
   --topPhrases
     --keyA
        --phrase: "who are you"
        --count: 2
     --key
        --phrase: "hello"
        --count: 1

Our dashboard now queries this data and shows on dashboard as top sentences used.

The problem we have is now we need to add date element to it. So basically currently this solves to answer "top said sentences ever by people"

What we now want to answer is "top said sentences today, this week, this month"

So, we probably need to re store the stat data model differently. Please advise.

You can add a field in stat called date storing the time when the message is created. Use a query to filter the timestamp and order the count. — KeroppiMomo
– KeroppiMomo, Commented Jun 8, 2019 at 6:45
does not sound correct. the who are you phrase showing count 2 could be on two different dates. so what date will u store? — Moblize IT
– Moblize IT, Commented Jun 8, 2019 at 7:18

Frank van Puffelen · Accepted Answer · 2019-06-08 14:23:28Z

The common recommendation is to store the data that you app needs to display. So if you want to display top sentences for today, for this week, and for this month, that means storing precisely those aggregates: the top sentences by day, week, and month.

A simple model for storing these is to keep your current, but then for each aggregation level, and each interval:

stats
   --topPhrases
     --keyA
        --phrase: "who are you"
        --count: 2
     --key
        --phrase: "hello"
        --count: 1
   --topPhrases_byDay
     --20190607
        --keyA
           --phrase: "who are you"
           --count: 2
        --key
           --phrase: "hello"
           --count: 1
     --20190607
        --keyA
           --phrase: "who are you"
           --count: 2
        --key
           --phrase: "hello"
           --count: 1
   --topPhrases_byWeek
     --201922
        --keyA
           --phrase: "who are you"
           --count: 2
        --key
           --phrase: "hello"
           --count: 1
     --201923
        --keyA
           --phrase: "who are you"
           --count: 2
        --key
           --phrase: "hello"
           --count: 1
   --topPhrases_byMonth
     --201905
        --keyA
           --phrase: "who are you"
           --count: 2
        --key
           --phrase: "hello"
           --count: 1
     --201906
        --keyA
           --phrase: "who are you"
           --count: 2
        --key
           --phrase: "hello"
           --count: 1

Alternatively, store all aggregations as a single list, and use prefixes to indicate their aggregation level (and the format of the rest of the key):

stats
   --topPhrases
     --keyA
        --phrase: "who are you"
        --count: 2
     --key
        --phrase: "hello"
        --count: 1
     day_20190607
        --keyA
           --phrase: "who are you"
           --count: 2
        --key
           --phrase: "hello"
           --count: 1
     day_20190608
        --keyA
           --phrase: "who are you"
           --count: 2
        --key
           --phrase: "hello"
           --count: 1
     week_201922
        --keyA
           --phrase: "who are you"
           --count: 2
        --key
           --phrase: "hello"
           --count: 1
     week_201923
        --keyA
           --phrase: "who are you"
           --count: 2
        --key
           --phrase: "hello"
           --count: 1
     month_201905
        --keyA
           --phrase: "who are you"
           --count: 2
        --key
           --phrase: "hello"
           --count: 1
     month_201906
        --keyA
           --phrase: "who are you"
           --count: 2
        --key
           --phrase: "hello"
           --count: 1

You're definitely duplicating a lot of data here, but the advantage of these models is that displaying the stats to a user is now trivial. That's a common trade-off with NoSQL databases, writing of data is made more complex, and more (duplicate) data is stored, but it makes reading the data trivial, and thus very scalable.

i am concerned on computations here. for each message posted, i am doing a write to messages node, then reads and write to overall aggregate, then more for storing by day. and as every time day data is updated, week and month and year data will be updated too.
That's indeed the process, but I don't understand what the concern is.
lol concern is using too much computation which directly means to billing

Collectives™ on Stack Overflow

Realtime database data structure modeling

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related