Query MongoDB to fetch a nested array with skip/limit by nested object key range in Django using pymongo

Question

I am learning Django with pymongo.

I have a MongoDB collection where I am storing some words and their year-wise occurrences in some books.

The documents are stored in MongoDB in the following format:

{
   "_id":{
      "$oid":"625c51eec27c99b793074501"
   },
   "word":"entropy",
   "occurrence":13,
   "year":{
      "1942":[
         {
            "book":{
               "$oid":"625c51eec27c99b7930744f9"
            },
            "number":8,
            "sentence":[
               1,
               288,
               322,
               1237,
               2570,
               2585,
               2617,
               2634
            ]
         }
      ],
      "1947":[
         {
            "book":{
               "$oid":"625c5280c27c99b793077042"
            },
            "number":5,
            "sentence":[
               377,
               2108,
               2771,
               3467,
               3502
            ]
         }
      ]
   }
}

Now I want to get the list of the sentences with skip and limit (and the respective book id) queried by _id and for specific year range.

For example,

I want to fetch an array where each row will be a dictionary containing 'year', 'book' and 'sentence'.
The array will be queried by the _id and year range.
A skip and limit will be applied on the sentence list

Is this a possible task using Django and pymongo? If yes, what is the fastest method?

So far I have done this:

search= {'$and': [{"_id": word_id_obj, "year.1942": {"$exists": 1}}]}
datalist= []
word_docs= wordcollec.find(search, {'year': 1, '_id': 0}).skip(1).limit(5)
sentlist['recordsFiltered']+= wordcollec.count_documents(search)

for b in word_docs:
    year_data= b['year'][1942]
    for by in year_data:
        i= i+1
        this_word= {'serial': i, 'year': cyear, 'book': str(by['book'])}
        datalist.append(this_word)

But obviously, it is not giving the desired result as the skip and limit are being applied to the root document object. Also the the year has a fixed value and no range.

It seems using '$slice' is an option. But I cannot figure it out.

Thanks for reading this far. And a lot more if you can throw some light.

Using data as a key, like "1942" and "1947", is usually not a good design, as you've discovered with your desired queries. Is it possible to change the design? If not, the queries are still possible but they won't be succinct, nor efficient. — rickhg12hs
– rickhg12hs, Commented Apr 20, 2022 at 23:48
Thank you. Yes... changing the DB design is my last resort. I was checking if it works with the existing one. — sariDon
– sariDon, Commented Apr 21, 2022 at 8:11

rickhg12hs · Accepted Answer · 2022-04-21 19:09:37Z

1

Here's one way to:

... fetch an array where each row will be a dictionary containing 'year', 'book' and 'sentence'.

db.collection.aggregate([
  { "$set": { "designWorkAround": { "$objectToArray": "$year" } } },
  { "$set": {
      "designWorkAround": {
        "$map": {
          "input": "$designWorkAround",
          "as": "yearArray",
          "in": {
            "year": "$$yearArray.k",
            "books": {
              "$map": {
                "input": "$$yearArray.v",
                "as": "bookArray",
                "in": {
                  "bookId": "$$bookArray.book",
                  "number": "$$bookArray.number",
                  "sentence": "$$bookArray.sentence"
                }
              }
            }
          }
        }
      }
    }
  },
  { "$unwind": "$designWorkAround" },
  { "$unwind": "$designWorkAround.books" },
  { "$project": {
      "_id": 0,
      "year": "$designWorkAround.year",
      "book": "$designWorkAround.books.bookId",
      "sentence": "$designWorkAround.books.sentence"
    }
  }
])

Try it on mongoplayground.net.

I don't know all the data generation nor queries that you might want, but I would probably redesign the collection and have something like one document per book with all the relevant fields in the document. This would make querying, indexing, etc., simpler and more efficient.

answered Apr 21, 2022 at 19:09

rickhg12hs

12k6 gold badges31 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

sariDon Over a year ago

So many thanks for your effort. I have actually redesigned the collection, but facing new problems. Please check: stackoverflow.com/questions/71962069/…

Collectives™ on Stack Overflow

Query MongoDB to fetch a nested array with skip/limit by nested object key range in Django using pymongo

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related