0

i am a beginner to mongodb and i have the assignment to write pipeline code. MY goal is to find the Region in India has the largest number of cities with longitude between 75 and 80? I hope anybody can help me to point out my misconceptions and/or mistakes, it is a very short code, so i am sure the pros will spot it right away.

Here is my code, i will post how the datastructure looks like under it :

    pipeline = [
    {"$match" :  {"lon": {"$gte":75, "$lte" : 80}},
                 {'country' : 'India'}},
    { '$unwind' : '$isPartOf'},
    { "$group":
    {
                "_id": "$name",
                "count" :{"$sum":{"cityname":"$name"}} }},

    {"$sort": {"count": -1}},
     {"$limit": 1}

]


{
    "_id" : ObjectId("52fe1d364b5ab856eea75ebc"),
    "elevation" : 1855,
    "name" : "Kud",
    "country" : "India",
    "lon" : 75.28,
    "lat" : 33.08,
    "isPartOf" : [
        "Jammu and Kashmir",
        "Udhampur district"
    ],
    "timeZone" : [
        "Indian Standard Time"
    ],
    "population" : 1140
}

2 Answers 2

1

The following pipeline will give you the desired result. The first $match pipeline operator uses standard MongoDB queries to filter the documents (cities) whose longitude is between 75 and 80 and as well as the ones only in India based on the country field. Since each document represents a city, the $unwind operator on the isPartOf deconstructs that array field from the filtered documents to output a document for each element. Each output document replaces the array with an element value. Thus for each input document, outputs n documents where n is the number of array elements and this operation is rather useful in the next $group operator stage since that's where you can calculate the number n through $sum group accumulator operator. The next pipeline stages will then transform your final document structure by introducing new replacement fields Region and NumberOfCities + sorting the documents in descending order and then returning the top 1 document which is your region with the largest number of cities:

pipeline = [
    {
        "$match": {
            "lon": {"$gte": 75, "$lte": 80},
            "country": "India"
        }
    },
    {
        "$unwind": "$isPartOf"
    },
    {
        "$group": {
            "_id":  "$isPartOf",
            "count": {
                "$sum": 1
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "Region": "$_id",
            "NumberOfCities": "$count"
        }
    },
    {
        "$sort": {"NumberOfCities": -1}
    },
    { "$limit": 1 }
]
Sign up to request clarification or add additional context in comments.

Comments

0

There are some syntax and logical errors in your pipeline.

  1. {"$match" : {"lon": {"$gte":75, "$lte" : 80}}, {'country' : 'India'}},

The Syntax here is wrong, you should just use comma to seperate key value pairs in `$match.

  1. "_id": "$name",

You are grouping based on city name and not on the region.

  1. {"$sum":{"cityname":"$name"}}

You need to send a numeric values to the $sum operator that result from applying a specified expression. {"cityname":"$name"} will be ignored.

The correct pipeline would be :-

[
    {"$match" :  {"lon": {"$gte":75,"$lte" : 80},'country' : 'India'}},
    { '$unwind' : '$isPartOf'},
    { "$group":
        {
                "_id": "$isPartOf",
                "count" :{"$sum":1}
         }
    },
    {"$sort": {"count": -1}},
    {"$limit": 1}

]

If you want to get all the cities in that region satisfying your condition as well ,you can add "cities": {'$push': '$name'} in the $group stage.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.