0

I have documents that look like this:

   {
     {
       "_id": ObjectId("5444fc67931f8b040eeca671"),
       "meta": {
         "SessionID": "45",
         "configVersion": "1",
         "DeviceID": "55",
         "parentObjectID": "55",
         "nodeClass": "79",
         "dnProperty": "16"
      },
       "cfg": {
         "Name": "test"
      }
    }   

The names and the data is just for testing atm. but I have a total of 25million documents in the DB. And I'm using find() to fetch a specific document(s) in this find() I use four arguments in this case, dnProperty, nodeClass, DeviceID and configVersion none of them are unique.

Atm. I have the index setup as simple as:

ensureIndex([["nodeClass", 1],["DeviceID", 1],["configVersion", 1], ["dnProperty",1]])

In other words I have index on the four arguments. I still have huge problems if you do a search that doesn't find any document at all. In my example all the "data" is random from 1-100 so if I do a find() with one of the values > 100 then it takes anywhere from 30-180sec to perform the search it also uses all of my 8gb RAM, then since there is no RAM left the computer goes very very slow.

What would be better indexes? Am I using indexes correct? Do I simply need more RAM since it will put "all" of the DB in it's working memory? Would you recommend another DB (other than mongo) to handle this better?

Sorry for multiple questions I hope they are short enough that you can give me an answer.

2
  • Can you show us your slow queries? Commented Oct 22, 2014 at 7:45
  • 25 million documents is a lot. You could consider these things: Use multiple collections to split up the data; Use a dataset (like google bigquery) its fast and sql-like. Commented Oct 22, 2014 at 7:46

1 Answer 1

1

MongoDB uses memory mapped files which means copy of your data and indexes is stored in RAM and whenever there is a query it fetches it from the RAM itself. In the current scenario your queries are slower because your data + indexes size is so large that it will not fit in RAM , hence there will be lot of I/O activity to get data from disk which is the bottleneck.

Sharding helps in solving this problem because if you partition/shard your data across for example 5 machines then you will have 8GB * 5 = 40GB RAM which can hold your (dataset + indexes = working set) in RAM itself and the overhead of I/O will be reduced leading to improved performance.

Hence in this case your indexes will not help improve performance beyond a certain point, you will need to shard your data across multiple machines. Sharding will tend to increase the read as well as write throughput linearly. Sharding in MongoDB

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.