3

I have a collection in a Mongo database that looks like this:

{
    "_id" : ObjectId("561b42d4e4b0d4227d011d2c"),
    "product_related_data" : {
        "depth" : 6,
        "height" : 23,
        "product_barcode" : "54491472",
        "product_name" : "xyz product",
        "product_uuid" : "009b9846-b3ad-49f7-a7a0-d35a04f83480",
        "width" : 6
    },
    "sensostrip_data" : {
        "barcode" : "130150208299",
        "battery_level" : 2.894
    },
    "stock_related_data" : {
        "calculated_max_product_percentage" : 15.625,
        "calculated_min_product_percentage" : 12.5,
        "current_stock" : 2,
        "max_stock" : 6,
        "stock_difference_percentage" : 0,
        "stock_difference_value" : 0,
        "stock_percentage" : 37.5
    },
    "store_data" : {
        "city_uuid" : "cbb4dfe8-172b-11e4-a1f0-00163ed23ec2",
        "ip" : "10.0.1.1",
        "retailer_uuid" : "8c33c32c-5903-11e4-a1f0-00163ed23ec2",
        "store_name" : "xyz store",
        "store_uuid" : "15a6cc90-081f-11e5-b213-001e6745ff8d"
    },
    "time" : {
        "check_date" : "2015-10-11 11:53:55",
        "previous_check_date" : "2015-10-11 11:48:57"
    },
    "id" : "6be54bef-0aa3-456c-b912-1731f8154e7d"
}

The mongo query I'm currently executing returns all the documents for a list of store_uuid's and for a product_uuid, and looks like this:

db.readings
.find({ $and:[ {"store_data.store_uuid": {$in:["15a6cc90-081f-11e5-b213-001e6745ff8d","217b983b-5904-11e4-a1f0-00163ed23ec2","5337d78d-5904-11e4-a1f0-00163ed23ec2"]}}
,{"product_related_data.product_uuid": "f44aa29d-09ce-4902-bf12-d45d44b3dfd0"}]})

My current Java implementation (where I make use of projection) looks like this:

DBCollection table = databaseConncetion().getCollection("readings");

BasicDBObject sensorReturn = new BasicDBObject("sensostrip_data.barcode",1);

BasicDBObject clause1 = new BasicDBObject("store_data.store_uuid", new BasicDBObject("$in", StoreIds));

BasicDBObject clause2 = new BasicDBObject("product_related_data.product_uuid", productId);

BasicDBList and = new BasicDBList();
and.add(clause1);
and.add(clause2);

DBObject query = new BasicDBObject("$and", and);

DBCursor cursor = table.find(query, sensorReturn
        .append("stock_related_data.stock_percentage",1)
        .append("store_data.store_uuid",1)
        .append("time.check_date", 1))
        .sort(new BasicDBObject("time.check_date", -1))
        .limit(100);

However I need this query to group the results for the latest check_date by barcode

1 Answer 1

2

The aggregation framework is at your disposal. Running the following aggregation pipeline will give you the desired result:


Mongo shell:

pipeline = [
    {
        "$match": {
            "store_data.store_uuid": {
                "$in": [
                    "15a6cc90-081f-11e5-b213-001e6745ff8d",
                    "217b983b-5904-11e4-a1f0-00163ed23ec2",
                    "5337d78d-5904-11e4-a1f0-00163ed23ec2"
                ]
            },
            "product_related_data.product_uuid": "f44aa29d-09ce-4902-bf12-d45d44b3dfd0"
        }
    },
    { "$sort": { "time.check_date": -1 } },
    {
        "$group": {
            "_id": "$sensostrip_data.barcode",
            "stock_percentage": { "$first": "$stock_related_data.stock_percentage" },
            "store_uuid": { "$first": "$store_data.store_uuid" },
            "check_date": { "$first": "$time.check_date" }
        }
    },
    { "$limit": 100 }
];
db.readings.aggregate(pipeline);

Java test implementation

public class JavaAggregation {
    public static void main(String args[]) throws UnknownHostException {

        MongoClient mongo = new MongoClient();
        DB db = mongo.getDB("test");

        DBCollection coll = db.getCollection("readings");

        // create the pipeline operations, first with the $match
        DBObject match = new BasicDBObject("$match",
                        new BasicDBObject("store_data.store_uuid", new BasicDBObject("$in", StoreIds))
                            .append("product_related_data.product_uuid", productId)                         
                    );

        // sort pipeline
        DBObject sort = new BasicDBObject("$sort",
                            new BasicDBObject("time.check_date", -1)
                        );

        // build the $group operations
        DBObject groupFields = new BasicDBObject( "_id", "$sensostrip_data.barcode"); // group by barcode
        groupFields.put("stock_percentage", new BasicDBObject( "$first", "$stock_related_data.stock_percentage")); // get the first when ordered documents are grouped
        groupFields.put("store_uuid", new BasicDBObject( "$first", "$store_data.store_uuid"));
        groupFields.put("check_date", new BasicDBObject( "$first", "$time.check_date"));
        // append any other necessary fields

        DBObject group = new BasicDBObject("$group", groupFields);

        // limit step
        DBObject limit = new BasicDBObject("$limit", 100);

        // put all together 
        List<DBObject> pipeline = Arrays.asList(match, sort, group, limit);

        AggregationOutput output = coll.aggregate(pipeline);

        for (DBObject result : output.results()) {
            System.out.println(result);
        }
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much for this thorough answer. The collection I'm querying is readings that come from sensors every 5mins, so the database has millions of records. Is there anything you've used here that is very expensive? My understanding is that using $group in mongo is very expensive, is that correct?
Contrary to your understanding this is not the case, performance of the aggregation pipeline is bound by several factors. With aggregation the computations are done on the server and it's as performant as doing find() queries and is a faster alternative to Map/Reduce for common aggregation operations. Please consult the manual for a detailed explanation on how you can enhance the performance given such data magnitude, you'd want to read on Aggregation Pipeline Optimization.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.