2

I have a problem getting the scores from all students who answered questions in a test using the mongodb aggregation pipeline approach

My aggregation pipeline gives an array of objects , which is made up of every students answer to the test questions.

The pipeline would be something like the one below, my example is simplified from my actual problem. Basically I group and push each questions array for each user into scores field. Then I use reduce to flatten the scores field

{ $group: { 
    _id: {}, 
    scores: { $push: "$questions" }
} }, 
{ $addFields: { 
    testScores: {
        $reduce: {
            input: "$scores",
            initialValue: [ ],
            in: { $concatArrays: [ "$$value", "$$this" ] }
        }
    } 
} }

The result would look something like:

testScores: [
    { id: 'questionOne' score: 1 }, 
    { id: 'questionOne', score: 3 }, 
    { id: 'questionOne', score: 8 }, 
    { id: 'questionOne' score: 2 },
    ....
    { id: 'questionFifty' score: 1 }, 
    { id: 'questionFifty', score: 3 }, 
    { id: 'questionFifty', score: 8 }, 
    { id: 'questionFifty' score: 2 }
]

My question is how do I get the average score for all scores for 'questionOne' and all the other questions ? I cannot unwind my array as I have a large number of tests and it seems mongoDb cannot unwind a sufficient number without returning null for the aggregation result.

In JavaScript I'd use reduce but as I understand it, mongodb allows use of vars outside the reduce function but as far as I understand you cannot modify the reduce function so something similar to the function below would not be possible.

myArray.reduce((acc, next){
    if(acc[next.id]}{
       acc[next.id].score += next.score
       acc[next.id].count+= 1
       acc[next].avg = acc[next.id].score/acc[next.id].count
    }else{
        acc[next.id].score = next.score
        acc[next.id].count = 1
    }
 return acc
},{} }

Thanks for any pointers

1
  • The pipeline would be something like the one below, my example is simplified from my actual problem. Basically I group and push each questions array for each user into scores field. Then I use reduce to flatten the scores field {$group:{ _id: {}, scores:{$push:"$questions) }}, {$addFields:{ testScores:{$reduce: {input: "$scores",initialValue: [ ],in: { $concatArrays: [ "$$value", "$$this" ] }}} }} Commented Jul 31, 2018 at 13:31

1 Answer 1

0

Yes that's possible with $reduce but there are two important caveats:

  • inside $let when you define vars section you can only refer to variables defined in outer scopes but you can't define multiple variables and refer to each other in the same block - that's why there has to be a lot of nestings in this solution
  • $reduce exposes $$value variable which represents current state of the aggregation. The thing is that you should consider that variable as immutable which means you can refer to it but you can't modify it

Then you can try following aggregation:

db.col.aggregate([
    {
        $project: {
            averages: {
                $reduce: {
                    input: "$testScores",
                    initialValue: [],
                    in: {
                        $let: {
                            vars: {
                                index: { $indexOfArray: [ "$$value.id", "$$this.id" ] }
                            },
                            in: {
                                $let: {
                                    vars: { 
                                        prev: { 
                                            $cond: [ { $ne: [ "$$index", -1 ] }, { $arrayElemAt: [ "$$value", "$$index" ] }, { id: "$$this.id", score: 0, count: 0 } ] 
                                        } 
                                    },
                                    in: {
                                        $let: {
                                            vars: {
                                                updated: {
                                                    id: "$$prev.id",
                                                    score: { $add: [ "$$prev.score", "$$this.score" ] },
                                                    count: { $add: [ "$$prev.count", 1 ] },
                                                    avg: {
                                                        $divide: [ { $add: [ "$$prev.score", "$$this.score" ] }, { $add: [ "$$prev.count", 1 ] } ]
                                                    }
                                                }
                                            },
                                            in: {
                                                $cond: {
                                                    if: { $eq: [ "$$index", -1 ] },
                                                    then: { $concatArrays: [ "$$value", [ "$$updated" ] ] },
                                                    else: { $concatArrays: [ { $slice: [ "$$value", "$$index"] }, [ "$$updated" ], { $slice: [ "$$value", { $add: [ "$$index", 1 ] }, { $size: "$$value" }] } ] }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
])

Actually each $let defines a step of algorithm here:

  • using $arrayElemAt to check if there's an aggregate for currently processed document
  • using $cond and $arrayElemAt to get previous value for currently processed id or provide default value
  • then under updated we calculate new values including average

to return the value we should consider two cases:

  • when there's no previous value then we simply append updated document to current array (using $concatArrays)
  • when we have to "update" exisiting aggregate we have to remove the old value and add the new one - $slice can be used here to get the documents before and after current $$index

For your example it outputs:

{
    "averages" : [
            {
                    "id" : "questionOne",
                    "score" : 14,
                    "count" : 4,
                    "avg" : 3.5
            },
            {
                    "id" : "questionFifty",
                    "score" : 14,
                    "count" : 4,
                    "avg" : 3.5
            }
    ]
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks I'll check this out
Miki, I take my hat off to you , the nesting alone fried my brain as does the repetition if you have to do this for several collections of data , it appears very cumbersome compared to a javascript solution , but I am very new to this and much more familiar with JS. I tried your solution out but got an error 'A pipeline stage specification object must contain exactly one field.' With my limited experience I did not resolve this. I finally did a part aggregation pipeline ( to pull out averages and arrays of scores) and node solution (to easily reduce arrays, get quartile values etc).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.