3

I'm currently working on a database with the following structure:

{"_id" : ObjectId("1abc2"),
"startdatetime" : ISODate("2016-09-11T18:00:37Z"),
"diveValues" : [
    {
        "temp" : 15.269,
        "depth" : 0.0,
    },
    {
        "temp" : 14.779257384,
        "depth" : 1.0,
    },
    {
        "temp" : 14.3940253165,
        "depth" : 2.0,
    },
    {
        "temp" : 13.9225795455,
        "depth" : 3.0,
    },
    {
        "temp" : 13.8214431818,
        "depth" : 4.0,
    },
    {
        "temp" : 13.6899553571,
        "depth" : 5.0,
    }
]}

The database has information about depth n metres in water, and the temperature on given depth. This is stored in the "diveValues" array. I have been successful on averaging on all depths between to dates, both monthly average and daily average. What I'm having a serious issue with is to get the average between to depths, say between 1 and 4 metres, for every month the last 6 months.

Here is an example of average temperature for each month from January to June, for all depths:

db.collection.aggregate(
  [
    {$unwind:"$diveValues"},
    {$match:
      {'startdatetime':
        {$gt:new ISODate("2016-01-10T06:00:29Z"),
        $lt:new ISODate("2016-06-10T06:00:29Z")}
        }
      },

    {$group:
      {_id: 
        { year: { $year: "$startdatetime" }, 
        month: { $month: "$startdatetime" }},
      avgTemp: { $avg: "$diveValues.temp" }}
    },
    {$sort:{_id:1}}
  ]
)

Resulting in:

{ "_id" : { "year" : 2016, "month" : 1 }, "avgTemp" : 7.575706502958313 }
{ "_id" : { "year" : 2016, "month" : 3 }, "avgTemp" : 6.85037457740135 }
{ "_id" : { "year" : 2016, "month" : 4 }, "avgTemp" : 7.215702831902588 }
{ "_id" : { "year" : 2016, "month" : 5 }, "avgTemp" : 9.153453683614638 }
{ "_id" : { "year" : 2016, "month" : 6 }, "avgTemp" : 11.497953009390237 }

Now, I can not seem to figure out how to get average temperature between 1 and 4 metres for the same period.

I have been trying to group the values by wanted depths, but have not managed it - more often than not ending up with bad syntax. Also, if I'm not wrong, the $match pipeline would return all depths as long as the dive has values for 1 and 4 metres, so that will not work.

With the find() tool I am using $slice to return the values I intend from the array - but have not been successful along with the aggregate() function.

Is there a way to solve this? Thanks in advance, much appreciated!

2 Answers 2

1

You'd need to place your $match pipeline before $unwind to optimize your aggregation operation as doing an $unwind operation on the whole collection could potentially cause some performance issues since it produces a copy of each document per array entry and that uses more memory (possible memory cap on aggregation pipelines of 10% total memory) thus takes "time" to produce the flattened arrays as well as "time" to process it. Hence it's better to limit the number of documents getting into the pipeline to be flattened.

db.collection.aggregate([
    {
        "$match": {
            "startdatetime": {
                "$gt": new ISODate("2016-01-10T06:00:29Z"),
                "$lt": new ISODate("2016-06-10T06:00:29Z")
            }, 
            "diveValues.depth": { "$gte": 1, "$lte": 4 } 
        } 
    },
    { "$unwind": "$diveValues" },
    { "$match": { "diveValues.depth": { "$gte": 1, "$lte": 4 } } },
    { 
        "$group": {
            "_id": { 
                "year": { "$year": "$startdatetime" }, 
                "month": { "$month": "$startdatetime" }
            },
            "avgTemp": { "$avg": "$diveValues.temp" }
        }
    }
])

If you want results to contain the average temps for all depths and for the 1-4 depth range, then you would need to run this pipeline which would use the $cond tenary operator to feed the $avg operator the accumulated temperatures within a group based on the depth range:

db.collection.aggregate([
    {
        "$match": {
            "startdatetime": {
                "$gt": new ISODate("2016-01-10T06:00:29Z"),
                "$lt": new ISODate("2016-06-10T06:00:29Z")
            }
        }
    },     
    { "$unwind": "$diveValues" },
    { 
        "$group": {
            "_id": { 
                "year": { "$year": "$startdatetime" }, 
                "month": { "$month": "$startdatetime" }
            },
            "avgTemp": { "$avg": "$diveValues.temp" },
            "avgTempDepth1-4": {
                "$avg": {
                    "$cond": [
                        { 
                            "$and": [
                                { "$gte": [ "$diveValues.depth", 1 ] }, 
                                { "$lte": [ "$diveValues.depth", 4 ] }
                            ]
                        },
                        "$diveValues.temp",
                        null                        
                    ]
                }
            }
        }
    }
])
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot, this worked out. And thank you very much for the tip about placing the match before the unwind.
@Vegar No worries. Just out of curiousity, which pipeline did you end up using, the first or the last one?
For my needs, the first one suits me best. But I'm very thankful that you wrote them both, because I find it highly educational.
0

First of all, the date $match operator should be used at the beginning of the pipeline so that indexes can be used.

Now, to the question, you just need to filter the depth interval like you did with the dates:

db.col.aggregate([
    {"$match": {
        'startdatetime': {
            "$gt": new ISODate("2016-01-10T06:00:29Z"),
            "$lt": new ISODate("2016-11-10T06:00:29Z")
        }
    }},
    {"$unwind": "$diveValues"},
    {"$match": {
        "diveValues.depth": {
            "$gte": 1.0,
            "$lt": 4.0
        }
    }},
    {"$group": {
        "_id": { 
            "year": {"$year": "$startdatetime" }, 
            "month": {"$month": "$startdatetime" }
        },
        "avgTemp": { "$avg": "$diveValues.temp" }}
    }
])

This will give you the average only for the chosen depth interval.

1 Comment

Thanks a lot, this looks like it works. Also, thanks for the tip about the placement of the unwind.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.