Generating Mongo query from MySQL query

Question

I have been using the following MySQL command to construct a heatmap from log data. However, I have a new data set that is stored in a Mongo database and I need to run the same command.

 select concat(a.packages '&' b.packages) "Concurrent Packages",
 count(*) "Count"
 from data a
 cross join data b
 where a.packages<b.packages and a.jobID=b.jobID
 group by a.packages, b.packages
 order by a.packages, b.packages;

Keep in mind that the tables a and b do not exist prior to the query. However, they are created from the packages column of the data table, which has jobID as the field which I want to check for matches. In other words if two packages are within the same job I want to add an entry to the concurrent usage count. How can I generate a similar query in Mongo?

What have you tried? Have you looked at this page for inspiration? — WiredPrairie
– WiredPrairie, Commented Mar 8, 2013 at 2:52

ronasta · Accepted Answer · 2013-04-02 01:52:34Z

This is not a "join" of different documents; it is an operation within one document, and can be done in MongoDB.

You have a SQL TABLE "data" like this:
  JobID   TEXT,
  package TEXT

The best way to store this in MongoDB will be a collection called "data", containing one document per JobID that contains an array of packages:

{
    _id: <JobID>,
    packages: [
        "packageA",
        "packageB",
        ....
    ]
}

[ Note: you could also implement your data table as only one document in MongoDB, containing an array of jobs which contain each an array of packages. This is not recommended, because you might hit the 16MB document size limit and nested arrays are not (yet) well supported by different queries - if you want to use the data for other purposes as well ]

Now, how to get a result like this ?

{ pair: [ "packageA", "packageB" ], count: 20 },
{ pair: [ "packageA", "packageC" ], count: 11 },
...

As there is no built-in "cross join" of two arrays in MongoDB, you'll have to program it out in the map function of a mapReduce(), emitting each pair of packages as a key:

mapf = function () {
    that = this;
    this.packages.forEach( function( p1 ) {
        that.packages.forEach( function( p2 ) {
            if ( p1 < p2 ) {
                key = { "pair": [ p1, p2 ] };
                emit( key, 1 );
            };
        });
    });
};

[ Note: this could be optimized, if the packages arrays were sorted ]

The reduce function is nothing more than summing up the counters for each key:

reducef = function( key, values ) {
    count = 0;
    values.forEach( function( value ) { count += value } );
    return count;
};

So, for this example collection:

> db.data.find()
{ "_id" : "Job01", "packages" : [ "pA", "pB", "pC" ] }
{ "_id" : "Job02", "packages" : [ "pA", "pC" ] }
{ "_id" : "Job03", "packages" : [ "pA", "pB", "pD", "pE" ] }

we get the following result:

> db.data.mapReduce(
...     mapf,
...     reducef,
...     { out: 'pairs' }
... );
{
    "result" : "pairs",
    "timeMillis" : 443,
    "counts" : {
        "input" : 3,
        "emit" : 10,
        "reduce" : 2,
        "output" : 8
    },
    "ok" : 1,
}
> db.pairs.find()
{ "_id" : { "pair" : [ "pA", "pB" ] }, "value" : 2 }
{ "_id" : { "pair" : [ "pA", "pC" ] }, "value" : 2 }
{ "_id" : { "pair" : [ "pA", "pD" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pA", "pE" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pC" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pD" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pB", "pE" ] }, "value" : 1 }
{ "_id" : { "pair" : [ "pD", "pE" ] }, "value" : 1 }

For more information on mapReduce consult: http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/ and http://docs.mongodb.org/manual/applications/map-reduce/

landons · Accepted Answer · 2013-03-08 02:15:47Z

1

You can't. Mongo doesn't do joins. Switching from SQL to Mongo is a lot more involved than migrating your queries.

Typically, you would include all the pertinent information in the same record (rather than normalize the information and select it with a join). Denormalize!

answered Mar 8, 2013 at 2:15

landons

9,5573 gold badges36 silver badges48 bronze badges

5 Comments

amber4478 Over a year ago

So what you are telling me is that there is no way to query a MongoDB to count the number of times 2 packages are used as part of the same job. Somehow I find that hard to believe.

landons Over a year ago

That's not what I'm saying. I'm saying you would actually cache that value, and save it with the job record in question (I'm still a little fuzzy if you're calculating this for a specific job or any job with two packages)

amber4478 Over a year ago

Any job that has multiple packages run, I would like to store the count for each pair of packages that are used in the same job. So for example, if I have packageA and packageB and they are used concurrently as part of 20 jobs. I would like a query that returns a count for the number of times each pair of packages is used as part of the same job.

amber4478 Over a year ago

I am not sure who upvoted this, but it definitely was not a good answer, since I know that it is possible to do a query like I am requesting and I was not transferring a database this is a new database.

landons Over a year ago

No, seriously. It's not. You would need multiple queries and application logic to get the results you want, unless you denormalize.

Collectives™ on Stack Overflow

Generating Mongo query from MySQL query

2 Answers 2

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related