0

I have twitter data that looks like this:

db.users.findOne()
{
    "_id" : ObjectId("578ffa8e7eb9513f4f55a935"),
    "user_name" : "koteras",
    "retweet_count" : 0,
    "tweet_followers_count" : 461,
    "source" : "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>",
    "coordinates" : null,
    "tweet_mentioned_count" : 1,
    "tweet_ID" : "755891629932675072",
    "tweet_text" : "RT @ochocinco: I beat them all for 10 straight hours #FIFA16KING",
    "user" : {
        "CreatedAt" : ISODate("2011-12-27T09:04:01Z"),
        "FavouritesCount" : 5223,
        "FollowersCount" : 461,
        "FriendsCount" : 619,
        "UserId" : 447818090,
        "Location" : "501"
    }

For example, I want to find the number of users that have "FollowersCount" greater than "FavouritesCount". How can I do that?

2 Answers 2

3

The $where operator is specifically designed for this.

db.users.find( { $where: function() { return (this.user.FollowersCount > this.user.FavouritesCount) } } );

But keep in mind that this would run single threaded JS code, and will be slower.

Another option is to use an aggregation pipeline projecting the difference, and then having a $match on the difference

db.users.aggregate([
  {$project: {
    diff: {$subtract: ["$user.FollowersCount", "$user.FavouritesCount"]},
    // project remaining fields here
    }
  },
  {$match: {diff: {$gt: 0}}}
])

In my experience I have found the second one to be much faster than the first.

Sign up to request clarification or add additional context in comments.

4 Comments

And in both cases, apply itcount() to the returned cursor to get the count of matching documents
Thanks man! What about the "and" function? If i want to find tweets that contain certain text and from specific location.
Ive tried this and it doesnt workdb.users.find({$and [{"user.Location": "501"},{ tweet_text: /UEFA/}]})
Thr's no need for $and in this. workdb.users.find({"user.Location": "501", tweet_text: /UEFA/}) should work as well
0

To get the number of users that have "FollowersCount" greater than "FavouritesCount", you could use the aggregation framework which has some operators that you can apply.

Consider the first use case which looks at manipulating the comparison operators within the $project pipeline and a subsequent $match pipeline to filter documents based on the $cmp value. You can then get the final user count by applying a $group pipeline that aggregates the filtered documents:

db.users.aggregate([
    {
        "$project": {               
            "hasMoreFollowersThanFavs": { 
                "$cmp": [ "$user.FollowersCount", "$user.FavouritesCount" ]
            }
        }
    },
    { "$match": { "hasMoreFollowersThanFavs": 1 } },    
    {
        "$group": {
            "_id": null,
            "count": { "$sum": 1 }
        }
    }
])

Another option is using a single pipeline with $redact operator which incorporates the functionality of $project and $match as above and returns all documents which match a specified condition using $$KEEP system variable and discards those that don't match using the $$PRUNE system variable:

db.collection.aggregate([
    {
        "$redact": {
            "$cond": [
                { 
                    "$eq": [
                        { "$cmp": [ "$user.FollowersCount", "$user.FavouritesCount" ] }, 
                        1
                    ]
                }, 
                "$$KEEP", 
                "$$PRUNE"
            ]
        }
    },  
    {
        "$group": {
            "_id": null,
            "count": { "$sum": 1 }
        }
    }
])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.