I'm currently testing some databases to my application. The main functionality is data aggregation (similar to this guy here: Data aggregation mongodb vs mysql).
I'm facing the same problem. I've created a sample test data. No joins on the mysql side, it's a single innodb table. It's a 1,6 milion rows data set and I'm doing a sum and a count on the full table, without any filter, so I can compare the performance of the aggregation engine of each one. All data fits in memory in both cases. In both cases, there is no write load.
With MySQL (5.5.34-0ubuntu0.12.04.1) I'm getting results always around 2.03 and 2.10 seconds. With MongoDB (2.4.8, linux 64bits) I'm getting results always between 4.1 and 4.3 seconds.
If I do some filtering on indexed fields, MySQL result time drops to around 1.18 and 1.20 (the number of rows processed drops to exactly half the dataset). If I do the same filtering on indexed fields on MongoDB, the result time drops only to around 3.7 seconds (again processing half the dataset, which I confirmed with an explain on the match criteria).
My conclusion is that: 1) My documents are extremely bad designed (truly can be), or 2) The MongoDB aggregation framework realy does not fit my needs.
The questions are: what can I do (in terms of especific mongoDB configurations, document modeling, etc) to make Mongo's results faster? Is this a case where MongoDB is not suited to?
My table and documento schemas:
| events_normal |
CREATE TABLE `events_normal` (
`origem` varchar(35) DEFAULT NULL,
`destino` varchar(35) DEFAULT NULL,
`qtd` int(11) DEFAULT NULL,
KEY `idx_orides` (`origem`,`destino`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
{
"_id" : ObjectId("52adc3b444ae460f2b84c272"),
"data" : {
"origem" : "GRU",
"destino" : "CGH",
"qtdResultados" : 10
}
}
The indexed and filtered fields when mentioned are "origem" and "destino".
select sql_no_cache origem, destino, sum(qtd), count(1) from events_normal group by origem, destino;
select sql_no_cache origem, destino, sum(qtd), count(1) from events_normal where origem="GRU" group by origem, destino;
db.events.aggregate( {$group: { _id: {origem: "$data.origem", destino: "$data.destino"}, total: {$sum: "$data.qtdResultados" }, qtd: {$sum: 1} } } )
db.events.aggregate( {$match: {"data.origem":"GRU" } } , {$group: { _id: {origem: "$data.origem", destino: "$data.destino"}, total: {$sum: "$data.qtdResultados" }, qtd: {$sum: 1} } } )
Thanks!