Is Mongodb Aggregation framework faster than map/reduce?

Question

Is the aggregation framework introduced in mongodb 2.2, has any special performance improvements over map/reduce?

If yes, why and how and how much?

(Already I have done a test for myself, and the performance was nearly same)

"nearly" the same? With which benchmarks? Your remark is basically pointless. And you are comparing cat and cows. In addition you know yourself that the MR is still limit to single-threading....so: pointless question and therefore -1 — user2665694
– user2665694, Commented Dec 17, 2012 at 6:01
@user1833746 It's a question, I don't want to explain my benchmarks. I asked to know new answers to this questioned. Please vote-up to allow others to answer. — Taha Jahangir
– Taha Jahangir, Commented Dec 17, 2012 at 6:59
have you seen this question (and answers)? stackoverflow.com/questions/12139149/… — Asya Kamsky
– Asya Kamsky, Commented Dec 17, 2012 at 8:57
Please refer this link for more understand. runnable.com/blog/… — soheshdoshi
– soheshdoshi, Commented Feb 12, 2021 at 10:22

Asya Kamsky · Accepted Answer · 2012-12-17 10:15:05Z

66

Every test I have personally run (including using your own data) shows aggregation framework being a multiple faster than map reduce, and usually being an order of magnitude faster.

Just taking 1/10th of the data you posted (but rather than clearing OS cache, warming the cache first - because I want to measure performance of the aggregation, and not how long it takes to page in the data) I got this:

MapReduce: 1,058ms
Aggregation Framework: 133ms

Removing the $match from aggregation framework and {query:} from mapReduce (because both would just use an index and that's not what we want to measure) and grouping the entire dataset by key2 I got:

MapReduce: 18,803ms
Aggregation Framework: 1,535ms

Those are very much in line with my previous experiments.

answered Dec 17, 2012 at 10:15

Asya Kamsky

42.4k5 gold badges113 silver badges133 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Asya Kamsky Over a year ago

for additional comments on this see answer to stackoverflow.com/questions/12139149/…

Jeach Over a year ago

Thanks for answering the first portion of the question! What about the second part? Why and how? Do you have something to add for that? Thank you for any input.

Asya Kamsky Over a year ago

this is covered in the docs - but in a nutshell, aggregation runs natively in the server (C++), MapReduce spawns separate javascript thread(s) to run JS code.

Taha Jahangir · Accepted Answer · 2012-12-17 09:23:15Z

9

My benchmark:

== Data Generation ==

Generate 4million rows (with python) easy with approximately 350 bytes. Each document has these keys:

key1, key2 (two random columns to test indexing, one with cardinality of 2000, and one with cardinality of 20)
longdata: a long string to increase size of each document
value: a simple number (const 10) to test aggregation

db = Connection('127.0.0.1').test # mongo connection
random.seed(1)
for _ in range(2):
    key1s = [hexlify(os.urandom(10)).decode('ascii') for _ in range(10)]
    key2s = [hexlify(os.urandom(10)).decode('ascii') for _ in range(1000)]
    baddata = 'some long date ' + '*' * 300
    for i in range(2000):
        data_list = [{
                'key1': random.choice(key1s),
                'key2': random.choice(key2s),
                'baddata': baddata,
                'value': 10,
                } for _ in range(1000)]
        for data in data_list:
            db.testtable.save(data)

Total data size was about 6GB in mongo. (and 2GB in postgres)

== Tests ==

I did some test, but one is enough to comparing results:

NOTE: Server is restarted, and OS cache is cleaned after each query, to ignore effect of caching.

QUERY: aggregate all rows with key1=somevalue (about 200K rows) and sum value for each key2

map/reduce 10.6 sec
aggreate 9.7 sec
group 10.3 sec

queries:

map/reduce:

db.testtable.mapReduce(function(){emit(this.key2, this.value);}, function(key, values){var i =0; values.forEach(function(v){i+=v;}); return i; } , {out:{inline: 1}, query: {key1: '663969462d2ec0a5fc34'} })

aggregate:

db.testtable.aggregate({ $match: {key1: '663969462d2ec0a5fc34'}}, {$group: {_id: '$key2', pop: {$sum: '$value'}} })

group:

db.testtable.group({key: {key2:1}, cond: {key1: '663969462d2ec0a5fc34'}, reduce: function(obj,prev) { prev.csum += obj.value; }, initial: { csum: 0 } })

answered Dec 17, 2012 at 9:23

Taha Jahangir

4,9322 gold badges44 silver badges52 bronze badges

3 Comments

Asya Kamsky Over a year ago

group is not aggregation framework, it's part of map/reduce. That's why it has a reduce function. See the difference here: docs.mongodb.org/manual/reference/command/group and docs.mongodb.org/manual/reference/aggregation/#_S_group If you were using aggregation framework you would be call db.collection.aggregate( [ pipeline ] )

Asya Kamsky Over a year ago

I have a suggestion: why don't you take out the query and run the same thing on your entire collection and see if there is a difference in performance.

Asya Kamsky Over a year ago

another problem with your benchmark is you cleared OS cache? So you were measuring mostly the time it takes to page the data into RAM. It dwarfs the actual performance numbers, and it's not a realistic scenario.

Collectives™ on Stack Overflow

Is Mongodb Aggregation framework faster than map/reduce?

2 Answers 2

3 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related