6

I'm trying to do PyMongo aggregate - $group averages of arrays, and I cannot find any examples that matches my problem.

Data example

{
    Subject: "Dave",
    Strength: [1,2,3,4]
},
{
    Subject: "Dave",
    Strength: [1,2,3,5]
},
{
    Subject: "Dave",
    Strength: [1,2,3,6]
},
{
    Subject: "Stuart",
    Strength: [4,5,6,7]
},
{
    Subject: "Stuart",
    Strength: [6,5,6,7]
},
{
    Subject: "Kevin",
    Strength: [1,2,3,4]
},
{
    Subject: "Kevin",
    Strength: [9,4,3,4]
}

Wanted results

{
    Subject: "Dave",
    mean_strength = [1,2,3,5]
},
{
    Subject: "Stuart",
    mean_strength = [5,5,6,7]
},
{
    Subject: "Kevin",
    mean_strength = [5,3,3,4]
}

I have tried this approach but MongoDB is interpreting the arrays as Null?

pipe = [{'$group': {'_id': 'Subject', 'mean_strength': {'$avg': '$Strength'}}}]
results = db.Walk.aggregate(pipeline=pipe)

Out: [{'_id': 'SubjectID', 'total': None}]

I've looked through the MongoDB documentation and I cannot find or understand if there is any way to do this?

1
  • When grouping always add $, here "_id": "$Subject". Commented Dec 18, 2017 at 14:29

3 Answers 3

4

You could use $unwind with includeArrayIndex. As the name suggests, includeArrayIndex adds the array index to the output. This allows for grouping by Subject and array position in Strength. After calculating the average, the results need to be sorted to ensure the second $group and $push add the results back into the right order. Finally there is a $project to include and rename the relevant columns.

db.test.aggregate([{
        "$unwind": {
            "path": "$Strength",
            "includeArrayIndex": "rownum"
        }
    },
    {
        "$group": {
            "_id": {
                "Subject": "$Subject",
                "rownum": "$rownum"
            },
            "mean_strength": {
                "$avg": "$Strength"
            }
        }
    },
    {
        "$sort": {
            "_id.Subject": 1,
            "_id.rownum": 1
        }
    },
    {
        "$group": {
            "_id": "$_id.Subject",
            "mean_strength": {
                "$push": "$mean_strength"
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "Subject": "$_id",
            "mean_strength": 1
        }
    }
])

For your test input, this returns:

{ "mean_strength" : [ 5, 5, 6, 7 ], "Subject" : "Stuart" }
{ "mean_strength" : [ 5, 3, 3, 4 ], "Subject" : "Kevin" }
{ "mean_strength" : [ 1, 2, 3, 5 ], "Subject" : "Dave" }
Sign up to request clarification or add additional context in comments.

1 Comment

In one of my last edits I added a few quotes(") to make this code run in PyMongo. The Mongo shell is slightly more forgiving regarding this.
1

You can try below aggregation.

For example, Dave has [[1,2,3,4], [1,2,3,5], [1,2,3,6]] after group stage.

Here is the matrix

Reduce function

Pass   Current Value (c) Accumulated Value (b)       Next Value
First:   [1,2,3,5]        [[1],[2],[3],[4]]           [[1,1],[2,2],[3,3],[5, 4]]
Second:  [1,2,3,6]        [[1,1],[2,2],[3,3],[5, 4]]  [[1,1,1],[2,2,2],[3,3,3],[5, 4, 6]]

Map function - Calculates avg for each array value from reduce stage to output [1,2,3,5]

[{"$group":{"_id":"$Subject","Strength":{"$push":"$Strength"}}}, //Push all arrays
 {"$project":{"mean_strength":{
   "$map":{//Calculate avg for each reduced indexed pairs.
     "input":{
       "$reduce":{
         "input":{"$slice":["$Strength",1,{"$subtract":[{"$size":"$Strength"},1]}]}, //Start from second array.
         "initialValue":{ //Initialize to the first array with all elements transformed to array of single values.
           "$map":{
             "input":{"$range":[0,{"$size":{"$arrayElemAt":["$Strength",0]}}]},
             "as":"a",
             "in":[{"$arrayElemAt":[{"$arrayElemAt":["$Strength",0]},"$$a"]}]
           }
         },
         "in":{
           "$let":{"vars":{"c":"$$this","b":"$$value"}, //Create variables for current and accumulated values
             "in":{"$map":{ //Creates map of same indexed values from each iteration 
                 "input":{"$range":[0,{"$size":"$$b"}]},
                 "as":"d",
                 "in":{
                   "$concatArrays":[ //Concat values at same index 
                     {"$arrayElemAt":["$$c","$$d"]}, //current
                     [{"$arrayElemAt":["$$b","$$d"]}] //accumulated
                  ]
                 }
               }
             }
           }
         }
       }
     },
    "as":"e",
    "in":{"$avg":"$$e"}
   }
 }}}
]

Comments

0

According to description as mentioned into above question, as a solution to it please try executing following aggregate query

db.collection.aggregate(

  // Pipeline
  [
    // Stage 1
    {
      $unwind: { path: "$Strength", includeArrayIndex: "arrayIndex" }   
    },

    // Stage 2
    {
      $group: {
        _id:{Subject:'$Subject',arrayIndex:'$arrayIndex'},
        mean_strength:{$avg:'$Strength'}
      }
    },

    // Stage 3
    {
      $group: {
      _id:{'Subject':'$_id.Subject'},
      mean_strength:{$push:'$mean_strength'}
      }
    },

    // Stage 4
    {
      $project: {
      Subject:'$_id.Subject',
      mean_strength:'$mean_strength',
      _id:0
      }
    }

  ]


);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.