3

I'm evaluating MongoDB and I want to see how capable it is in terms of querying.

Regarding my data sets, I may need to use a field's value to compare with another field's. Best way to explain is to give an example.

In the following json, I want to return documents with at least one person whose age is less than 30 and whose country's population is more than 100M.:

{
  people: [
    { name: "Feyyaz", age: 28, country: "Turkiye" },
    { name: "Joseph", age: 25, country: "USA" },
    ...
  ],
  countries: [
    { name: "Turkiye", population: 75000000 },
    { name: "USA", population: 300000000 },
    ...
  ]
}

Note: the example is completely made up by me, because my real world examples are much more complicated. And changing the structure should be the last option.

3
  • Can you please provide us an attempt of code? Here is the best place to start: MongoDB refdoc. Trying to code can help you to precise your question, we will not code for you. Commented Jul 18, 2014 at 11:20
  • @dgiugg Not something everyone would know how to do, and the question clearly states that this is an evaluation. There is a clear problem to solve here when you look at it that defies code for the new initiate. Commented Jul 18, 2014 at 11:27
  • @dgiugg, see Neil's answer below. As a person evaluating a product, I think it's hard to even attempt. Try to give constructive answers! Commented Jul 18, 2014 at 12:33

3 Answers 3

3

If you can use Python for this job than you can consider using the query language ObjectPath This allows you to complete the job in one line like this:

$.people[@.age<30 and $.countries[@.name is @@.country].population > 100000000]

except that "@@" has not been implemented yet - if you'd like to use it, you can write a feature request on the github page

Disclaimier: there's a plan to integrate this language with MongoDB in the near future, so that it can leverage MongoDB distributed capabilities.

Sign up to request clarification or add additional context in comments.

Comments

1

Standard query operations with .find() will not match two fields together in the way you are asking. You can get "close" results with standard match conditions, but actually having array elements compare is a little more advanced.

The "advanced swiss army knife" you are looking for comes in the form of the aggregation framework for MongoDB. This does a whole lot more than just "aggregate" data, as it is also the tool for general document manipulation and evaluation:

db.pop.aggregate([

  // Match possible documents to reduce work
  { "$match": {
    "people.age": { "$lt": 30 },
    "countries.population": { "$gt": 100000000 }
  }},

  // Test the conditions against the arrays      
  { "$project": {
    "people": 1,
    "countries": 1,
    "match": {
      "$anyElementTrue": {
        "$map": {
          "input": "$people",
          "as": "p",
          "in": {
            "$anyElementTrue": {
              "$map": {
                "input": "$countries",
                "as": "c",
                "in": {
                  "$and": [
                    { "$lt": [ "$$p.age",30 ] },
                    { "$gt": [ "$$c.population",100000000 ] },
                    { "$eq": [ "$$p.country", "$$c.name" ] }
                  ]
                }
              }
            }
          }
        }
      }
    }
  }},

  // Filter any documents that did not match
  { "$match": { "match": true }}
])

If you are after "filtering" those to just matching results then you can do this a little differently. I'll break up $project stages, but you could do it in one:

db.pop.aggregate([

  // Match possible documents to reduce work
  { "$match": {
    "people.age": { "$lt": 30 },
    "countries.population": { "$gt": 100000000 }
  }},

  // Filter the people array for matches
  { "$project": {
    "people": {
      "$setDifference": [
        { "$map": {
          "input": "$people",
          "as": "p",
          "in": {
            "$cond": [
              { "$and": [
                { "$lt": [ "$$p.age", 30 ] },
                {
                  "$anyElementTrue": {
                    "$map": {
                      "input": "$countries",
                      "as": "c",
                      "in": {
                        "$and": [
                          { "$gt": [ "$$c.population", 100000000 ] },
                          { "$eq": [ "$$p.country", "$$c.name" ] }
                        ]
                      }
                    }
                  }
                }
              ]},
              "$$p",
              false
            ]
          }
        }},
        [false]
      ]
    },
    "countries": 1
  }},

  // Discard any document that did not meet conditions
  { "$match": { "people": { "$ne": false } }},

  // Filter the countries to matching people
  { "$project": {
    "people": 1,
    "countries": {
      "$setDifference": [
        { "$map": {
          "input": "$countries",
          "as": "c",
          "in": {
            "$cond": [
              { "$and": [
                { "$gt": [ "$$c.population", 100000000 ] },
                {
                  "$anyElementTrue": {
                    "$map": {
                      "input": "$people",
                      "as": "p",
                      "in": {
                        "$eq": [ "$$p.country", "$$c.name" ]
                      }
                    }                    
                  }
                }
              ]},
              "$$c",
              false
            ]
          }
        }},
        [false]
      ]
    }
  }}
])

And in the second case you would get documents "filtered" of array elements that did not match like this:

{
    "_id" : ObjectId("53c8f1645117367f5ff2036c"),
    "people" : [
            {
                    "name" : "Joseph",
                    "age" : 25,
                    "country" : "USA"
            }
    ],
    "countries" : [
            {
                    "name" : "USA",
                    "population" : 300000000
            }
    ]
}

Pretty powerful stuff.

Also see the aggregation framework operators and other aggregation samples in the documentation.

You can do similar things using mapReduce as well, but generally the aggregation framework is preferred as it is a native code implementation and MongoDB mapReduce relies on JavaScipt interpretation to run.

4 Comments

Thank you for the answer. That means this kind of queries are possible, but with a lot of effort and a hard to read script :). By the way, it seems correct but it's not working in try.mongodb.org. Probably a syntax issue. Would be nice if you fix it.
@FeyyazE The syntax is MongoDB 2.6 and upwards only. Still possible to do in earlier versions, but with a much more confusing listing. You are "evaluating", so "evaluate". Download and install and play with the latest release. MongoDB 2.6.3 at the time of writing.
@FeyyazE I ran the code myself before posting, so I know that it works. The result posted is right of my console. You did remove the comments? Even though that should work anyway. Check your installation. Or otherwise you appear to have changed something.
I do have v2.6.3 installed, but I just realized that try.mongodb.org does not support all commands, or it's not up to date.. It's working on my computer, thanks.
0

Referring to FeyyazE comment in NeilLunn answer, actually you can also use standard javascript with really classic and easy-to-read functions, like this:

function test1 (field) {return field <= 30;}
function test2 (field) {return field >= 100000000;}

var fct = function (array1, field1, pivot1, array2, field2, pivot2) {
    for (var key in array1) {
        if (test1(array1[key][field1])) {
            for (var key2 in array2) {
                if (array2[key2][pivot2] == array1[key][pivot1] && test2(array2[key2][field2])) {
                    return true;
                }
            }
        }
    }
    return false;
}

db.test.find({$where: "fct(
    this.people,
    'age',
    'country',
    this.countries,
    'population',
    'name'
)"});

But this will really take a looong time to mongo to evaluate. I tried it in shell with a small 100K docs collection and it took it... 3 seconds! So maybe you will prefer effort and hard-to-read script...

1 Comment

I don't think anyone is talking about one document here but 10000's

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.