0

I have stored objects in my mongodb(version 3.2) collection in the following schema,

{
    "_id" : ObjectId("585a42b5b7e79d1c0c533f1f"),
    "instanceId" : "i-b385a9bd",
    "DiskSpaceAvailable" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 4.32112884521484,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 4.32107543945312,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:50:00.000Z"),
                "Average" : 4.32101821899414,
                "Unit" : "Gigabytes"
            }
        ]
    },
    "DiskSpaceUsed" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 3.33073806762695,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 3.33079147338867,
                "Unit" : "Gigabytes"
            }
        ]
    },
    "MemoryUsed" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 0.753532409667969,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 0.753063201904297,
                "Unit" : "Gigabytes"
            }
        ]
    },
    "MemoryUtilization" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:18:00.000Z"),
                "Average" : 19.5049320125989,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:36:00.000Z"),
                "Average" : 19.5078950721357,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:54:00.000Z"),
                "Average" : 19.5068086169722,
                "Unit" : "Percent"
            }
        ]
    },
    "DiskSpaceUtilization" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:18:00.000Z"),
                "Average" : 42.9914921714092,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:36:00.000Z"),
                "Average" : 42.9921815029693,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:54:00.000Z"),
                "Average" : 42.992920072498,
                "Unit" : "Percent"
            }
        ]
    },
    "SwapUtilization" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:18:00.000Z"),
                "Average" : 0,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:36:00.000Z"),
                "Average" : 0,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:54:00.000Z"),
                "Average" : 0,
                "Unit" : "Percent"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T13:12:00.000Z"),
                "Average" : 0,
                "Unit" : "Percent"
            }
        ]
    },
    "SwapUsed" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T13:06:00.000Z"),
                "Average" : 0,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T13:24:00.000Z"),
                "Average" : 0,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:36:00.000Z"),
                "Average" : 0,
                "Unit" : "Gigabytes"
            }
        ]
    },
    "MemoryAvailable" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 3.10872268676758,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 3.10919189453125,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:50:00.000Z"),
                "Average" : 3.10895538330078,
                "Unit" : "Gigabytes"
            }
        ]
    }
}

I am trying to use mongodb aggregate and following is my query

db.collectionSchema.aggregate([
    {
     $match :{ "instanceId" : "i-b385a9bd" }
    },
    {
      $unwind : "$DiskSpaceAvailable.Datapoints"   
    },
     {
      $unwind : "$DiskSpaceUtilization.Datapoints"   
    },
    {
      $unwind : "$DiskSpaceUsed.Datapoints"   
    },
    {
      $unwind : "$MemoryUsed.Datapoints"   
    },
    {
      $unwind : "$SwapUtilization.Datapoints"   
    },
    {
      $unwind : "$MemoryAvailable.Datapoints"   
    },
    {
      $unwind : "$MemoryUtilization.Datapoints"   
    },
    {
      $unwind : "$SwapUsed.Datapoints"   
    },
    {
      $group : { _id : "$instanceId" , 
               DiskSpaceAvailable : { "$avg" : "$DiskSpaceAvailable.Datapoints.Average" } , 
               DiskSpaceAvailableUnit : { "$addToSet" : "$DiskSpaceAvailable.Datapoints.Unit" },
               DiskSpaceUtilization : {"$avg" : "$DiskSpaceUtilization.Datapoints.Average"},
               DiskSpaceUtilizationUnit : {"$addToSet" : "$DiskSpaceUtilization.Datapoints.Unit"},
               DiskSpaceUsed : {"$avg" : "$DiskSpaceUsed.Datapoints.Average"},
               DiskSpaceUsedUnit : {"$addToSet" : "$DiskSpaceUsed.Datapoints.Unit"},
               MemoryUsed :{"$avg" : "$MemoryUsed.Datapoints.Average"},
               MemoryUsedUnit:{"$addToSet" : "$MemoryUsed.Datapoints.Unit"},
               SwapUtilization:{"$avg" : "$SwapUtilization.Datapoints.Average"},
               SwapUtilizationUnit:{"$addToSet" : "$SwapUtilization.Datapoints.Unit"},
               MemoryAvailable:{"$avg" : "$MemoryAvailable.Datapoints.Average"},
               MemoryAvailableUnit:{"$addToSet" : "$MemoryAvailable.Datapoints.Unit"},
               MemoryUtilization:{"$avg" : "$MemoryUtilization.Datapoints.Average"},
               MemoryUtilizationUnit: {"$addToSet" : "$MemoryUtilization.Datapoints.Unit"},
               SwapUsed:{"$avg" : "$SwapUsed.Datapoints.Average"},
               SwapUsedUnit: {"$addToSet" : "$SwapUsed.Datapoints.Unit"}
               }  
    },
        {
            $project : { _id:1 , 
              DiskSpaceAvailable:1 , 
              DiskSpaceAvailableUnit : 1,
              DiskSpaceUtilization : 1,
              DiskSpaceUtilizationUnit : 1,
              DiskSpaceUsed : 1,
              DiskSpaceUsedUnit : 1,
              MemoryUsed :1,
              MemoryUsedUnit:1,
              SwapUtilization:1,
              SwapUtilizationUnit:1,
              MemoryAvailable:1,
              MemoryAvailableUnit:1,
              MemoryUtilization:1,
              MemoryUtilizationUnit: 1,
              SwapUsed:1,
              SwapUsedUnit:1
              }
        }
    ]);

This query does not return and runs indefinitely, I have tried with top 4 unwind operators it works takes about 3-4 seconds but after adding in the 5th unwind operator the query goes for a toss and does not return. I am sure I am doing something wrong but unable to put a finger on it, can someone please point out if I am making a mistake.

Any kind of suggestions are most welcome, I am willing to change the schema as well.

Thank you :)

5
  • 1
    Why will it return? This is like running mapReduce on 300 millions documents and expect it to return in 1 millisecond. Commented Dec 22, 2016 at 18:27
  • What are you trying to do? What version of mongod are you on? Commented Dec 22, 2016 at 18:29
  • Any suggestions, shall i use different collections for different data then ? Commented Dec 22, 2016 at 18:29
  • version 3.2, i am trying to get average of datapoints with their units for each data header Commented Dec 22, 2016 at 18:30
  • @sstyvane you were right, I changed my schema back then...it was a total misunderstanding at my end earlier...thank you :) Commented Dec 1, 2017 at 11:38

1 Answer 1

0

That's hell lot of data in a single document. Unwinding this many nested docs and computing the average for the same not only adds to the response time but also to the consumed resources!

To make your aggregate query subsequently fast, I insist you should try doing an average while inserting the doc instead of doing it on retrieval.

E.g.- While adding the 1st doc (the average is 5), the overall average of DiskSpaceAvailable will be 5 & when second sub doc is added (with average 2), the total average is computed as 5+2/2 = 3.5.

The data design will be something like :-

{
    "_id" : ObjectId("585a42b5b7e79d1c0c533f1f"),
    "instanceId" : "i-b385a9bd",
    "DiskSpaceAvailableUnit": "Gigabytes",
    "DiskSpaceAvailableAverage": <The computed average value>,
    "DiskSpaceAvailable" : {
        "Datapoints" : [ 
            {
                "Timestamp" : ISODate("2016-12-20T12:14:00.000Z"),
                "Average" : 4.32112884521484,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:32:00.000Z"),
                "Average" : 4.32107543945312,
                "Unit" : "Gigabytes"
            }, 
            {
                "Timestamp" : ISODate("2016-12-20T12:50:00.000Z"),
                "Average" : 4.32101821899414,
                "Unit" : "Gigabytes"
            }
        ]
    },
    ....
}

So, you will just have to fetch the data without doing any kind of computation & the response will also be quite fast(much less compared to your current response time).

Though, Such a structure will subsequently increase the computation times & complexity for inserts/updates. But if faster retrieval is of prime importance then you should take this structure into consideration.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.