MongoDB aggregate queries vs. MySQL SELECT field1 FROM table

Question

I am completely new to MongoDB and wanted to compare query performance of a NoSQL data model relative to its relational database counter part. I wrote this into MongoDB shell

// Make 10 businesses
// Each business has 10 locations
// Each location has 10 departments
// Each department has 10 teams
// Each team has 100 employees
(new Array(10)).fill(0).forEach(_=>
    db.businesses.insert({
        "name":"Business Name",
        "locations":(new Array(10)).fill(0).map(_=>({
            "name":"Office Location",
            "departments":(new Array(10)).fill(0).map(_=>({
                "name":"Department",
                "teams":(new Array(10)).fill(0).map(_=>({
                    "name":"Team Name",
                    "employees":(new Array(100)).fill(0).map(_=>({
                        "age":Math.floor(Math.random()*100)
                    }))
                }))
            }))
        }))
    })
);

Then I attempted the equivalent of MySQL's EXPLAIN SELECT age,name,(and a few other fields) FROM employees WHERE age >= 50 ORDER BY age DESC by writing this statement:

db.businesses.aggregate([
    { $unwind: "$locations" },
    { $unwind: "$locations.departments" },
    { $unwind: "$locations.departments.teams" },
    { $unwind: "$locations.departments.teams.employees" },
    { $project: { _id: 0, age: "$locations.departments.teams.employees.age" } },
    { $match: { "age": { $gte: 50 }} },
    { $sort: {"age" : -1}}
]).explain("executionStats")

The result was:

"errmsg" : "Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.",

So I deleted the sort clause and try to get an explain. But the result was:

TypeError: db.businesses.aggregate(...).explain is not a function

So my questions are:

Primarily, I want to know the performance difference of SELECT age FROM employees WHERE age >= 50 ORDER BY age DESC when compared with the MongoDB's aggregate query counter part. Is it more or less the same? Will one be substantially faster or more performant than the other?
Alternatively, how do I fix my MongoDB query so that I can get performance details to compare against my MySQL query counter part?

learningtech · Accepted Answer · 2019-11-28 20:33:14Z

2

Employees are single entities; thus, you probably don't want to model age of a team member so deeply in the rich structure of departments and locations and teams. It is perfectly fine to have a separate employees collection and simply do:

db.businesses.aggregate([
{$match: {"age": {$gt: 50} }}
,{$sort: {"age": -1} }
]);

Deep in your businesses collection you can have:

{ teams: [ {name: "T1", employees: [ "E1", "E34" ]} ] }

Alternately, try this:

db.businesses.aggregate([ your pipeline] ,{allowDiskUse:true});

The OP has a setup of 10 biz -> 10 loc -> 10 depts -> 10 teams -> 100 emps. The first 3 unwinds creates a 10000x explosion of data but the last one is 100x beyond that. We can shrink the hit by using $filter:

db.businesses.aggregate([
{ $unwind: "$locations" },
{ $unwind: "$locations.departments" },
{ $unwind: "$locations.departments.teams" },

{$project: {
        XX: {$filter: {
                    input: "$locations.departments.teams.employees",
                    as: "z",
                    cond: {$gte: [ "$$z.age", 50] }
            }}
    }}
,{$unwind: "$XX"}
,{$sort: {"XX.age":-1}}])

edited Nov 28, 2019 at 20:33

learningtech

33.9k89 gold badges269 silver badges449 bronze badges

answered Nov 28, 2019 at 17:06

Buzz Moschetti

7,6433 gold badges27 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

learningtech Over a year ago

Won't having employees as separate entities bring me back to my original problem of this stackoverflow.com/questions/59080541/… ? Which is what spawned a series of 4-5 questions that got me to this current question.

learningtech Over a year ago

And results of my experiment, the mongodb query took 3 seconds when I used {allowDiskUse:true}. The MySQL SELECT age FROM t_employee WHERE age => 50 ORDER BY age DESC took 1.2 seconds. My guess is MySQL doesn't enable allowDiskUse by default. No indices were applied in either scenario. Is this result expected for this scenario? Others also see MongoDB being 3x slower than MySQL?

Buzz Moschetti Over a year ago

Working on a map based alternative to bypass the 100X explosion of employee at the end....

learningtech Over a year ago

Thanks, with the filter approach plus the {allowDiskUse:true}, the MongoDB query took 2 seconds. So that's good improvement from the 3 seconds!

Okkano · Accepted Answer · 2019-11-28 17:11:30Z

1

You better move $match to the first pipeline, because aggregation framework loses index after first pipeline, also i guess you don't need to unwind those arrays.

answered Nov 28, 2019 at 17:11

Okkano

2282 silver badges9 bronze badges

Comments

Dĵ ΝιΓΞΗΛψΚ · Accepted Answer · 2019-11-28 16:29:21Z

0

i was able to get a result in 1.5 seconds without any indexes by modifying the query like the following:

db.businesses.aggregate([
    {
        $unwind: "$locations"
    },
    {
        $unwind: "$locations.departments"
    },
    {
        $unwind: "$locations.departments.teams"
    },
    {
        $unwind: "$locations.departments.teams.employees"
    },
    {
        $match: {
            "locations.departments.teams.employees.age": {
                $gte: 50
            }
        }
    },
    {
        $project: {
            _id: 0,
            age: "$locations.departments.teams.employees.age"
        }
    },
    {
        $group: {
            _id: "$age"
        }
    },
    {
        $project: {
            _id: 0,
            age: "$_id"
        }
    },
    {
        $sort: {
            "age": - 1
        }
    }
], {
    explain: false
})

answered Nov 28, 2019 at 16:29

Dĵ ΝιΓΞΗΛψΚ

5,8044 gold badges16 silver badges30 bronze badges

3 Comments

learningtech Over a year ago

Thanks, the {$group:{_id:"$age"}} effectively does the equivalent of GROUP BY age right? Which means if my employee has a name field, then I won't accurately get back the name of the employee along with their age (ie. SELECT name,age FROM employee)?

learningtech Over a year ago

When I remove the {$group:_id:"$age"}, but leave in the sort filter, my mongo still crashes from not-enough memory. I don't have the same problem with MySQL. No indices were applied in either scenario.

learningtech Over a year ago

In my MySQL SELECT name,age FROM employee WHERE age>=50 ORDER by name DESC also takes about 1 second. No indices were applied. Still can't get the MongoDB equivalent to not crash...

Buzz Moschetti · Accepted Answer · 2019-11-29 15:05:23Z

0

There is another way to address the overall problem, although it is not apples to apples with the OP question. The goal is to find all age >= 50 and sort. Below is an example that "almost" does so and throws in the loc,dept,team as well in case you were wondering how to get that too, but you can take out the lines to get just the emps. Now, this is unsorted -- but an argument can be made that the DB engine isn't going to do any better job of sorting this than the client and all the data has to come over the wire anyway. And the client can use more sophisticated coding tricks to dig thru to the age field and sort it.

c = db.foo.aggregate([
{$project: {XX:
  {$map: {input: "$locations", as:"z", in:
          {$map: {input: "$$z.departments", as:"z2", in:
                  {$map: {input: "$$z2.teams", as:"z3", in:
                          {loc: "$$z.name",  // remove if you want
                           dept: "$$z2.name", // remove if you want
                           team: "$$z3.name",  // remove if you want
                           emps: {$filter: {input: "$$z3.employees",
                                     as: "z4",
                                     cond: {$gt: [ "$$z4.age", 50] }
                                    }}
                          }
                      }}
              }}
      }}
    }}
]);

ages = [];

c.forEach(function(biz) {
    biz['XX'].forEach(function(locs) {
        locs.forEach(function(depts) {
            depts.forEach(function(teams) {
                teams['emps'].forEach(function(emp) {
                    ages.push(emp['age']);
                                    });
                            });
                    });
            });
    });

print( ages.sort(function(a, b){return b-a}) );

99,98,97,96,95,94,92,92,84,81,78,77,76,72,71,67,66,65,65,64,63,62,62,61,59,59,57,57,57,56,55,54,52,51

On a MacBook Pro running MongoDB 4.0, we see the collection as follows:

Collection            Count   AvgSize          Unz  Xz  +Idx     TotIdx  Idx/doc
--------------------  ------- -------- -G--M------  --- ---- ---M------  -------
                 foo       10   2238682     22386820  4.0    0      16384    0

Given the random age between 0 and 100, it is not surprising that every loc/dept/team has age >= 50 and that the total number of bytes returned is about half. Note, however that the total time to set up the agg -- not return all the bytes -- is ~700 millis.

697 millis to agg; 0.697
found 10
tot bytes 11536558

edited Nov 29, 2019 at 15:05

answered Nov 28, 2019 at 18:30

Buzz Moschetti

7,6433 gold badges27 silver badges41 bronze badges

7 Comments

learningtech Over a year ago

Also on your machine, does your query performance match that of MySQL?

Buzz Moschetti Over a year ago

I only have postgres. :-) And I wouldn't know the "best" way to model SQL to be performant. I could do a 4 way left outer join biz->loc->dept->teams->emps and filter for age >= 50 but I just haven't tried.

learningtech Over a year ago

In my secnario, no joins would be necessary, because i'd only be selecting on fields from the employees table, and there's no need to reference any of the other tables. So you're simply doing a SELECT name from employees WHERE age >= 50 ORDER by age DESC. I'm curious how Postgres performs relative to MongoDB query you showed above!

Buzz Moschetti Over a year ago

Oh! I thought the goal was to identify emps >= 50 who were on teams (based on a sample nested SELECT example in your other post), not just the basic emps >= 50. No?

learningtech Over a year ago

That's correct, i just want the names of emps with age >= 50 sorted in descending order of age. I don't need details of their business, locations, departments or teams. I can't find a MongoDB query that can perform as fast or efficiently as SELECT name FROM employees WHERE age >= 50 ORDER BY age DESC

|

Collectives™ on Stack Overflow

MongoDB aggregate queries vs. MySQL SELECT field1 FROM table

4 Answers 4

4 Comments

Comments

3 Comments

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Comments

3 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related