MySQL Vs MongoDB aggregation performance

Question

I'm currently testing some databases to my application. The main functionality is data aggregation (similar to this guy here: Data aggregation mongodb vs mysql).

I'm facing the same problem. I've created a sample test data. No joins on the mysql side, it's a single innodb table. It's a 1,6 milion rows data set and I'm doing a sum and a count on the full table, without any filter, so I can compare the performance of the aggregation engine of each one. All data fits in memory in both cases. In both cases, there is no write load.

With MySQL (5.5.34-0ubuntu0.12.04.1) I'm getting results always around 2.03 and 2.10 seconds. With MongoDB (2.4.8, linux 64bits) I'm getting results always between 4.1 and 4.3 seconds.

If I do some filtering on indexed fields, MySQL result time drops to around 1.18 and 1.20 (the number of rows processed drops to exactly half the dataset). If I do the same filtering on indexed fields on MongoDB, the result time drops only to around 3.7 seconds (again processing half the dataset, which I confirmed with an explain on the match criteria).

My conclusion is that: 1) My documents are extremely bad designed (truly can be), or 2) The MongoDB aggregation framework realy does not fit my needs.

The questions are: what can I do (in terms of especific mongoDB configurations, document modeling, etc) to make Mongo's results faster? Is this a case where MongoDB is not suited to?

My table and documento schemas:

| events_normal |

CREATE TABLE `events_normal` (
  `origem` varchar(35) DEFAULT NULL,
  `destino` varchar(35) DEFAULT NULL,
  `qtd` int(11) DEFAULT NULL,
  KEY `idx_orides` (`origem`,`destino`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

{
    "_id" : ObjectId("52adc3b444ae460f2b84c272"),
    "data" : {
        "origem" : "GRU",
        "destino" : "CGH",
        "qtdResultados" : 10
    }
}

The indexed and filtered fields when mentioned are "origem" and "destino".

select sql_no_cache origem, destino, sum(qtd), count(1) from events_normal group by origem, destino;
select sql_no_cache origem, destino, sum(qtd), count(1) from events_normal where origem="GRU" group by origem, destino;

db.events.aggregate( {$group: {         _id: {origem: "$data.origem", destino: "$data.destino"},         total: {$sum: "$data.qtdResultados" },         qtd: {$sum: 1}     }  } )
db.events.aggregate( {$match: {"data.origem":"GRU" } } , {$group: {         _id: {origem: "$data.origem", destino: "$data.destino"},         total: {$sum: "$data.qtdResultados" },         qtd: {$sum: 1}     }  } )

Thanks!

The aggregation framework is not currently as fast as SQLs which is old (in a good way) and extremely mature with parallel threading etc etc, there are numerous improvements to be made in the next few versions, though by above 4 seconds do you mean massively like 6 or 7 or just 4.01 seconds? — Sammaye
– Sammaye, Commented Dec 15, 2013 at 17:20
@Sammaye sorry for not beeing specific: it's around 4.1 seconds, sometimes 4.3. I've corrected that on the question. — Marcos Vinícius da Silva
– Marcos Vinícius da Silva, Commented Dec 15, 2013 at 17:29

Philipp · Accepted Answer · 2013-12-15 16:41:57Z

5

Aggregation is not really what MongoDB was originally designed for, so it's not really its fastest feature.

When you really want to use MongoDB, you could use sharding so that each shard can process its share of the aggregation (make sure to select the shard-key in a way that each group is on only one cluster, or you will achieve the opposite). This, however, wouldn't be a fair comparison to MySQL anymore because the MongoDB cluster would use a lot more hardware.

answered Dec 15, 2013 at 16:41

Philipp

70.1k10 gold badges121 silver badges159 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Marcos Vinícius da Silva Over a year ago

Hi Philipp, thanks for the answear! In thes case, I could shard the MySQL table too, using MySQL Cluster, and, theoretically get even faster results, right?

Philipp Over a year ago

@MarcosViníciusdaSilva Maybe. I never used MySQL sharding. MongoDB was designed from the ground up for sharding while MySQL added sharding as an afterthought, so I would assume that MongoDB profits more, but that is just my estimation not backed by any experience or facts.

Marcos Vinícius da Silva Over a year ago

Thanks for the answers. I've concluded that I'll need to use a more elaborated strategy to keep the results fast, in any database.

Collectives™ on Stack Overflow

MySQL Vs MongoDB aggregation performance

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related