1

We're making a query where the results returned should be a list of suggested search terms.

We currently have a query that checks for a regex match at multiple fields:

$or:[ 
{'description.position':/s/i}, 
{'employer.name':/s/i}, 
{'hiringManager.profile.name':/s/i}
]

We'd like the returned results be an array of matches that unique (not duplicates).

The results returned look something like:

I20150311-18:17:14.151(-7)?   "fields": {
I20150311-18:17:14.154(-7)?     "hiringManager": {
I20150311-18:17:14.157(-7)?       "profile": {
I20150311-18:17:14.160(-7)?         "name": "Seth Sandler"
I20150311-18:17:14.163(-7)?       }
I20150311-18:17:14.167(-7)?     },
I20150311-18:17:14.173(-7)?     "description": {
I20150311-18:17:14.177(-7)?       "position": "Cook"
I20150311-18:17:14.181(-7)?     },
I20150311-18:17:14.187(-7)?     "employer": {
I20150311-18:17:14.191(-7)?       "name": "Employer"
I20150311-18:17:14.195(-7)?     },
I20150311-18:17:14.206(-7)?   }
I20150311-18:17:14.209(-7)? }
I20150311-18:17:14.212(-7)? {
I20150311-18:17:14.223(-7)?   "fields": {
I20150311-18:17:14.226(-7)?     "hiringManager": {
I20150311-18:17:14.229(-7)?       "profile": {
I20150311-18:17:14.232(-7)?         "name": "Seth Sandler"
I20150311-18:17:14.234(-7)?       }
I20150311-18:17:14.237(-7)?     },
I20150311-18:17:14.240(-7)?     "description": {
I20150311-18:17:14.243(-7)?       "position": "Cook"
I20150311-18:17:14.246(-7)?     },
I20150311-18:17:14.249(-7)?     "employer": {
I20150311-18:17:14.252(-7)?       "name": "Employer 4"
I20150311-18:17:14.254(-7)?     },
I20150311-18:17:14.264(-7)?   }
I20150311-18:17:14.267(-7)? }
I20150311-18:17:14.269(-7)? {
I20150311-18:17:14.281(-7)?   "fields": {
I20150311-18:17:14.284(-7)?     "hiringManager": {
I20150311-18:17:14.287(-7)?       "profile": {
I20150311-18:17:14.290(-7)?         "name": "Seth Sandler"
I20150311-18:17:14.293(-7)?       }
I20150311-18:17:14.295(-7)?     },
I20150311-18:17:14.298(-7)?     "description": {
I20150311-18:17:14.301(-7)?       "position": "Chef"
I20150311-18:17:14.304(-7)?     },
I20150311-18:17:14.307(-7)?     "employer": {
I20150311-18:17:14.310(-7)?       "name": "Emplopyer 3"
I20150311-18:17:14.313(-7)?     },
I20150311-18:17:14.321(-7)?   }
I20150311-18:17:14.323(-7)? }
I20150311-18:17:14.325(-7)? {
I20150311-18:17:14.334(-7)?   "fields": {
I20150311-18:17:14.336(-7)?     "hiringManager": {
I20150311-18:17:14.338(-7)?       "profile": {
I20150311-18:17:14.340(-7)?         "name": "Seth Sandler"
I20150311-18:17:14.342(-7)?       }
I20150311-18:17:14.344(-7)?     },
I20150311-18:17:14.346(-7)?     "description": {
I20150311-18:17:14.348(-7)?       "position": "Chef"
I20150311-18:17:14.350(-7)?     },
I20150311-18:17:14.353(-7)?     "employer": {
I20150311-18:17:14.356(-7)?       "name": "Employer"
I20150311-18:17:14.359(-7)?     },
  I20150311-18:17:14.366(-7)?   }
I20150311-18:17:14.369(-7)? }

We'd like to instead have the results be a unique array for values of hiringManager.profile.name, employer.name, and description.position.

Our current solution doesn't seem ideal (probably not performant), and were wondering if it's possible using the mongogodb aggregate function to put field values into an array.

Current solution (not ideal):

aggregate([
{$match: {$or:[ {'description.position':/s/i}, {'employer.name':/s/i}, {'hiringManager.profile.name':/s/i}    ]}},
{$group: {_id: 1, positions: {$push: '$description.position'}, employerNames: {$push: '$employer.name'}, hiringManagerNames: {$push:'$hiringManager.profile.name'}}},
{$project: {_id:1, texts: {$setUnion: ['$positions', {$setUnion: ['$employerNames', '$hiringManagerNames']}]}}}
])
})

The output of this is correct, but we'd like a better aggregate function where we can limit the results.

I20150311-18:25:26.461(-7)?   "result": [
I20150311-18:25:26.465(-7)?     {
I20150311-18:25:26.468(-7)?       "_id": 1,
I20150311-18:25:26.472(-7)?       "texts": [
I20150311-18:25:26.478(-7)?         "Employer 5",
I20150311-18:25:26.481(-7)?         "Employer 4",
I20150311-18:25:26.485(-7)?         "Employer 1",
I20150311-18:25:26.488(-7)?         "Manager",
I20150311-18:25:26.504(-7)?         "Cook",
I20150311-18:25:26.507(-7)?         "Chef",
I20150311-18:25:26.530(-7)?       ]
I20150311-18:25:26.534(-7)?     }
I20150311-18:25:26.538(-7)?   ]
2
  • So your problem here is that the result is just one big document and you just want the "distinct" "texts" values in a response. Correct? Commented Mar 12, 2015 at 1:31
  • that's correct. The issue is that the distinct values are from 3 different fields (since we're querying 3 fields for the regex match). Commented Mar 12, 2015 at 1:58

2 Answers 2

2

It probably would be better to use another technique in order to get the distinct results by making the "text" the actual "grouping key" of the $group pipeline. There is a trick to doing that reasonably efficiently in odern MongoDB versions like you have, being version 2.6 or greater:

db.collection.aggregate([
    { "$match": {
        "$or":[
            { "description.position":/s/i },
            { "employer.name":/s/i},
            { "hiringManager.profile.name":/s/i }
        ]
    }},
    { "$project": {
        "_id": { 
            "$setDifference": [
                { "$map": {
                    "input": { "$literal": ["A","B","C" ] },
                     "as": "type",
                    "in": { "$cond": [
                        { "$eq": [ "$$type", "A" ] },
                        "$description.position",
                        { "$cond": [
                            { "$eq": [ "$$type", "B" ] },
                            "$employer.name",
                            "$hiringManager.profile.name"
                        ]}
                    ]}
                }},
                [null] 
            ]
        }
    }},
    { "$unwind": "$_id" },
    { "$group": { "_id": "$_id" } }
])

So $map is used as the basis to trigger a "switch" by processing the $literal array of ["A","B","C"] sent to it. So for each of those elements the appropriate field is chosen as the output value.

Just in case any of those values were null or possibly even a duplicate in the same document the $setDifference operator will sort that out.

The resulting array in each document is processed with $unwind so that it's elements can then be passed as the grouping key to $group which results in distinct documents for each "text" term.

Of course the trade-off here is that the documents in the pipeline would be a multiple of the documents in the collection by up to three possible values as from each field, so more documents in the pipeline than the query matches until distinctly grouped. So there is a cost involved when using $unwind.

The benefit is separate documents in the results, which can grow beyond 16MB of individual "texts" by using a cursor to output. Of course, that's a lot of text to begin with.

The other note to your existing aggregation operation is considering that you are already accepting $setUnion to combine the fields and get distinct values you may as well even "reduce" the input arrays by using $addToSet instead. This avoids growing the arrays with duplicates that you would end up removing anyway.

The same $setDifference operation should also be considered since your $or condition does not guarantee that "all" of the fields contain a valid string or are even present. Where not all fields are valid then you would also receive an distinct result of null along with the other text terms.

So it's about weighing up which is more important to you. The present operation is likely to be faster and less resource intensive ( with the modifications mentioned ), but the alternate caters for larger and possibly more palatable responses. It also allows you to "limit" and maybe even do things like "count" the occurrences of those "text" values.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks Neil. I'm hoping to test this later. It's saying there's an extra bracket somewhere which i'll look into, but this looks like the solution we were looking for.
@user1218464 possible. I just typed it in here. I will also check the syntax.
@user1218464 Ah. Was missing a comma after the $project stage.
After testing this some more, I'm not sure the result is actually what we want (it's close though). This result will give us unique $description.position if the search term matches any of the 3. Meaning, if the search query matches on the field $hiringManager.profile.name, the value is not the hiring manager (what we want), but the $description.position in all cases. What we want is if the search matches one of the 3 fields, we want the value of the matching field and then we want to group or make these unique so that there aren't more than 1 of the same result.
To re-phrase: the goal is to suggest search terms. The suggested terms should be things that match the regex for either the $description.position, $hiringManager.profile.name, or $description.position fields.
0

@Neil's answer is close, but it seemed like another match was needed to ensure the results matched the original regex. I'm not sure if this is a good solution, but here's a new working aggregate. It seemed to work without setDifferennce, so I'm not sure if that's needed or not.

Basically, I run another match on the unwind result to ensure they match the original regex.

aggregate([

  { '$match': {
        '$or':[
            { 'description.position':/s/i },
            { 'employer.name':/s/i},
            { 'hiringManager.profile.name':/s/i }
        ]
    }},
    { '$project': {
        '_id':  
                { '$map': {
                    'input': { '$literal': ['A','B','C' ] },
                     'as': 'type',
                     'in': { '$cond': [
                        { '$eq': [ '$$type', 'A' ] },
                        '$description.position',
                        { '$cond': [
                            { '$eq': [ '$$type', 'B' ] },
                            '$employer.name',
                            '$hiringManager.profile.name'
                        ]}
                    ]}
                },
        }
    }},
    { '$unwind': '$_id' },
    { '$match': { '_id':/s/i }},
{ '$group': { '_id': '$_id' } }
]);
});

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.