2

I have a mongo collection lie this :

    {
   "_id":ObjectId("55f16650e3cf2242a79656d1"),
   "user_id":11,
   "push":[
      ISODate("2015-09-08T11:14:18.285      Z"),
      ISODate("2015-09-08T11:14:18.285      Z"),
      ISODate("2015-09-09T11:14:18.285      Z"),
      ISODate("2015-09-10T11:14:18.285      Z"),
      ISODate("2015-09-10T11:14:18.285      Z")
   ]
}{
   "_id":ObjectId("55f15c78e3cf2242a79656c3"),
   "user_id":12,
   "push":[
      ISODate("2015-09-06T11:14:18.285      Z"),
      ISODate("2015-09-05T11:14:18.285      Z"),
      ISODate("2015-09-07T11:14:18.285      Z"),
      ISODate("2015-09-09T11:14:18.285      Z"),
      ISODate("2015-09-09T11:14:18.285      Z"),
      ISODate("2015-09-10T11:14:18.285      Z"),
      ISODate("2015-09-11T11:14:18.285      Z")
   ]
}

How can I find user_ids where count of timeStamps < 3 and having date(timestamp) > (currentDate-5) in single query. I will be using php and dont want to bring all the documents in memory.

Explanation:

user_id : date       : count
11      : 2015-09-08 : 2
          2015-09-09 : 1
          2015-09-10 : 2

12      : 2015-09-05 : 1
          2015-09-06 : 1
          2015-09-07 : 1
          2015-09-09 : 2
          2015-09-10 : 1
          2015-09-11 : 1

If date set to 2015-09-09(user input) it will give 3(count) for user_id 11 and 4(count) for user_id 12. So suppose count is set to 3(user input). The query should return 11(user_id). If count is set to 2, there will be no user_id available and if count is set to 5, it should return both 11 and 12

5
  • Ouch!. Why would you name a field "push"? This is about as close to deliberately causing obfuscated code as you can get. Seems like someones "typo". But do you really mean to just find the "count" of array items that is "less than three"? Commented Sep 10, 2015 at 12:25
  • by push I mean push_notification, and there are many feilds like sms, now its understood I guess, and these are the timestamps when the push has been sent. and yes count of those array items which are greater than a particular date Commented Sep 10, 2015 at 12:33
  • My point was that "resulting code" can often look like { "$push": { "push": "something" } } ( as a JSON representation ) and at any rate look very confusing to the reader. Clear field naming will help the readability of your code. And the other point here is "what date"?. This is completely missing from your question, and why I ask in comments when I think you are being unclear. Please edit to tell us all about your unmentioned "date criteria". Unless all you mean is the basic selection of documents, or do you only want to "count" array entries that occur "within" that timeframe as well? Commented Sep 10, 2015 at 12:37
  • I have updated the question, but previously I wrote "where numberOfTimeStamps is less than 3 in last 5 days" So "5 days" was describing the date criteria Commented Sep 10, 2015 at 12:50
  • Okay then. That does at least make it clear. Please understand the way it is/was presented in the question is/was open to interpretation. And as such can lead to an incorrect result/response. Why I asked. Commented Sep 10, 2015 at 12:56

1 Answer 1

1

To solve this you need an aggregation pipeline that first "filters" the results to the "last 5 days" and then essentially "sums the count" of array items present in each qualifying document to then see if the "total" is "less than three".

The $size operator of MongoDB aggregation really helps here, as does $map and some additional filtering via $setDifference for the false results returned from $map, as doing this "in document first" and "within" the $group stage required, is the most efficient way to process this

$result = $collection->aggregate(array(
    array( '$match' => array(
        'push' => array( 
            'time' => array( 
                '$gte' =>  MongoDate( strtotime('-5 days',time()) )
            )
        )     
    )),
    array( '$group' => array(
        '_id' => '$user_id',
        'count' => array(
            '$sum' => array(
                '$size' => array(
                    '$setDifference' => array(
                        array( '$map' => array(
                            'input' => '$push',
                            'as' => 'time',
                            'in' => array(
                                '$cond' => array(
                                    array( '$gte' => array(
                                        '$$time',
                                        MongoDate( strtotime('-5 days',time()) )
                                    )),
                                    '$time',
                                    FALSE
                                )
                            ) 
                        )),
                        array(FALSE)
                    )
                )
            )
        )
    )),
    array( '$match' => array(
        'count' => array( '$lt' => 3 )
    )) 
));

So the after all of the work to first find the "possible" documents that contain array entries meeting the criteria via $match and then find the "total" size of the matched array items under $group, then the final $match excludes all results that are less than three in total size.


For the largely "JavaScript brains" out there ( like myself, well trained into it ) this is basically this contruct:

db.collection.aggregate([
    { "$match": {
        "push": {
            "$gte": new Date( new Date().valueOf() - ( 5 * 1000 * 60 * 60 * 24 ))
        }
    }},
    { "$group": {
        "_id": "$user_id",
        "count": {
            "$sum": {
                "$size": {
                    "$setDifference": [
                        { "$map": {
                            "input": "$push",
                            "as": "time",
                            "in": {
                                "$cond": [
                                    { "$gte": [ 
                                        "$$time",
                                        new Date( 
                                            new Date().valueOf() - 
                                            ( 5 * 1000 * 60 * 60 * 24 )
                                        )
                                    ]},
                                    "$$time",
                                    false
                                ]
                            }
                        }},
                        [false]
                    ]
                }
            }
        }
    }},
    { "$match": { "count": { "$lt": 3 } } }
])

Also, future versions of MongoDB will offer $filter, which simplifies the whole $map and $setDifference statement part:

db.collection.aggregate([
    { "$match": {
        "push": {
            "$gte": new Date( new Date().valueOf() - ( 5 * 1000 * 60 * 60 * 24 ))
        }
    }},
    { "$group": {
        "_id": "$user_id",
        "count": {
            "$sum": {
                "$size": {
                    "$filter": {
                        "input": "$push",
                        "as": "time",
                        "cond": {
                            "$gte": [
                                "$$time",
                                new Date( 
                                    new Date().valueOf() - 
                                    ( 5 * 1000 * 60 * 60 * 24 )
                                )                       
                            ]
                        }
                    }
                }
            }
        }
    }},
    { "$match": { "count": { "$lt": 3 } } }
])

As well as noting that the "dates" are probably best calculated "before" the pipeline definition as a separate variable for the best accuracy.

Sign up to request clarification or add additional context in comments.

4 Comments

Sir, getting wrong result: docs.google.com/document/d/…
@lalit Go and take a look back at the long line of comments earlier where you explicitly say "number of timestamps is less than 3 in 5 days", which is exactly what is happening here by "filtering" the timestamps in the arrays to only return those that occur within 5 days. Not only are you changing criteria in this "test" but also you are saying you "expect" the "full an unfiltered" results from the array after I asked you at least twice to clarify that was not what you meant. Each time "I said it already ... (repeat) ..". You got what you asked for. And "thankyou" is the response.
I think I should have written "sum of count of timeStamps < 3 and having date(timestamp) > (currentDate-5)" by count I meant the column name in the explanation section. sorry for the confusion. Will try and edit the answer and question accordingly. Thank you very much
@lalit Umm. If that is supposed to be SQL then the above does exactly that. As already explained. The timestamps are "filtered" to only those that are greater than 5 days before the current time and then only those that match are counted and we return ( having ) only those where the count is greater than 3. Exactly the translation. You are just working the logic wrong.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.