2

MongoDB query/aggregation needed: If I have an array, how can I group documents depending on different values in an array? Example:

If I have these objects:

> db.respondents.insert({person: 1, responses: [{question: 'How old are you?', response: '18-40 yrs'}, {question: 'What is the brand of your car?', response: 'Fiat'} ] } )
> db.respondents.insert({person: 2, responses: [{question: 'How old are you?', response: '18-40 yrs'}, {question: 'What is the brand of your car?', response: 'Volvo'} ] } )
> db.respondents.insert({person: 3, responses: [{question: 'How old are you?', response: '41-65 yrs'}, {question: 'What is the brand of your car?', response: 'Volvo'} ] } )
> db.respondents.insert({person: 4, responses: [{question: 'How old are you?', response: '41-65 yrs'}, {question: 'What is the brand of your car?', response: 'Volvo'} ] } )

And would like to write a query which tells me what car brand respondents own (aka base question) per age group (aka breakdown question)?

So the answer should tell me:

1 person in age group '18-40' responded 'Fiat' to question 'What is the brand of you car?'

1 person in age group '18-40' responded 'Volvo' to question 'What is the brand of you car?'

2 persons in age group '41-65' responded 'Volvo' to question 'What is the brand of you car?'

And IRL:

  • There are 100.000+ respondents
  • There are about 30 'responses' per respondent
  • MongoDb 3.0.9 is used

I've tried numerous ways but won't bore you with my failures....

2 Answers 2

3

It's a pitty you don't have MongoDB 3.2, since operators like $arrayElemAt and $filter make this a simple process with a single $group stage:

db.respondents.aggregate([
  { "$match": { 
    "responses.question": { 
      "$all": [
        "How old are you?",
        "What is the brand of your car?"
      ]
    } 
  }},
  { "$group": {
    "_id": {
      "age": {
        "$arrayElemAt": [
          { "$map": {
            "input": { "$filter": {
              "input": "$responses",
              "as": "res",
              "cond": {
                "$eq": [ "$$res.question", "How old are you?" ]
              }
            }},
            "as": "res",
            "in": "$$res.response"
          }},
          0
        ]
      },
      "car": {
        "$arrayElemAt": [
          { "$map": {
            "input": { "$filter": {
              "input": "$responses",
              "as": "res",
              "cond": {
                "$eq": [ "$$res.question", "What is the brand of your car?" ]
              }
            }},
            "as": "res",
            "in": "$$res.response"
          }},
          0
        ]
      }
    },
    "count": { "$sum": 1 }
  }}
])

In earlier versions you need to $unwind the content and then conditionally select the required response values via $cond:

db.respondents.aggregate([
  { "$match": { 
    "responses.question": { 
      "$all": [
        "How old are you?",
        "What is the brand of your car?"
      ]
    } 
  }},
  { "$unwind": "$responses" },
  { "$match": { 
    "responses.question": { 
      "$in": [
        "How old are you?",
        "What is the brand of your car?"
      ]
    } 
  }},
  { "$group": {
    "_id": "$_id",
    "age": {
      "$max": {
        "$cond": [
          { "$eq": [ "$responses.question", "How old are you?" ] },
          "$responses.response",
          null
        ]
      }
    },
    "car": {
      "$max": {
        "$cond": [
          { "$eq": [ "$responses.question", "What is the brand of your car?" ] },
          "$responses.response",
          null
        ]
      }
    }
  }},
  { "$group": {
    "_id": {
      "age": "$age",
      "car": "$car"
    },
    "count": { "$sum": 1 }
  }}
])

But of course it is very possible, and the common results are:

{ "_id" : { "age" : "41-65 yrs", "car" : "Volvo" }, "count" : 2 }
{ "_id" : { "age" : "18-40 yrs", "car" : "Volvo" }, "count" : 1 }
{ "_id" : { "age" : "18-40 yrs", "car" : "Fiat" }, "count" : 1 }
Sign up to request clarification or add additional context in comments.

2 Comments

I hereby ignore the SO comment guidelines and write: +1, thanks - you're my hero! Sweetness. Exactly what I was trying to achieve.
Excuse me - meant 'heroine' of course. I bet it wasn't moonlight yesterday.
1

I see no straightforward way to do it. But! You may do this:

db.respondents.aggregate([
  {$unwind:'$responses'},
  {$match:{'responses.question':'How old are you?'}}
]).foreach(function(resp){
  db.responses.update({_id:resp._id},{$set:{ageGroup:resp.responses.response}});
})

It could work for some time but then you'll have convenient ageGroup field and use it for grouping.

1 Comment

Of course it's possible and quite straightforward, and there is absolutely no need to write data to another collection. It's better and far more efficient with a modern release, but still can be done in any version without multiple queries or client looping.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.