0

I am learning Django with pymongo.

I have a MongoDB collection where I am storing some words and their year-wise occurrences in some books.

The documents are stored in MongoDB in the following format:

{
   "_id":{
      "$oid":"625c51eec27c99b793074501"
   },
   "word":"entropy",
   "occurrence":13,
   "year":{
      "1942":[
         {
            "book":{
               "$oid":"625c51eec27c99b7930744f9"
            },
            "number":8,
            "sentence":[
               1,
               288,
               322,
               1237,
               2570,
               2585,
               2617,
               2634
            ]
         }
      ],
      "1947":[
         {
            "book":{
               "$oid":"625c5280c27c99b793077042"
            },
            "number":5,
            "sentence":[
               377,
               2108,
               2771,
               3467,
               3502
            ]
         }
      ]
   }
}

Now I want to get the list of the sentences with skip and limit (and the respective book id) queried by _id and for specific year range.

For example,

  1. I want to fetch an array where each row will be a dictionary containing 'year', 'book' and 'sentence'.
  2. The array will be queried by the _id and year range.
  3. A skip and limit will be applied on the sentence list

Is this a possible task using Django and pymongo? If yes, what is the fastest method?

So far I have done this:

search= {'$and': [{"_id": word_id_obj, "year.1942": {"$exists": 1}}]}
datalist= []
word_docs= wordcollec.find(search, {'year': 1, '_id': 0}).skip(1).limit(5)
sentlist['recordsFiltered']+= wordcollec.count_documents(search)

for b in word_docs:
    year_data= b['year'][1942]
    for by in year_data:
        i= i+1
        this_word= {'serial': i, 'year': cyear, 'book': str(by['book'])}
        datalist.append(this_word)

But obviously, it is not giving the desired result as the skip and limit are being applied to the root document object. Also the the year has a fixed value and no range.

It seems using '$slice' is an option. But I cannot figure it out.

Thanks for reading this far. And a lot more if you can throw some light.

2
  • Using data as a key, like "1942" and "1947", is usually not a good design, as you've discovered with your desired queries. Is it possible to change the design? If not, the queries are still possible but they won't be succinct, nor efficient. Commented Apr 20, 2022 at 23:48
  • Thank you. Yes... changing the DB design is my last resort. I was checking if it works with the existing one. Commented Apr 21, 2022 at 8:11

1 Answer 1

1

Here's one way to:

... fetch an array where each row will be a dictionary containing 'year', 'book' and 'sentence'.

db.collection.aggregate([
  { "$set": { "designWorkAround": { "$objectToArray": "$year" } } },
  { "$set": {
      "designWorkAround": {
        "$map": {
          "input": "$designWorkAround",
          "as": "yearArray",
          "in": {
            "year": "$$yearArray.k",
            "books": {
              "$map": {
                "input": "$$yearArray.v",
                "as": "bookArray",
                "in": {
                  "bookId": "$$bookArray.book",
                  "number": "$$bookArray.number",
                  "sentence": "$$bookArray.sentence"
                }
              }
            }
          }
        }
      }
    }
  },
  { "$unwind": "$designWorkAround" },
  { "$unwind": "$designWorkAround.books" },
  { "$project": {
      "_id": 0,
      "year": "$designWorkAround.year",
      "book": "$designWorkAround.books.bookId",
      "sentence": "$designWorkAround.books.sentence"
    }
  }
])

Try it on mongoplayground.net.

I don't know all the data generation nor queries that you might want, but I would probably redesign the collection and have something like one document per book with all the relevant fields in the document. This would make querying, indexing, etc., simpler and more efficient.

Sign up to request clarification or add additional context in comments.

1 Comment

So many thanks for your effort. I have actually redesigned the collection, but facing new problems. Please check: stackoverflow.com/questions/71962069/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.