1

I have a collection of user generated posts. They contain the following fields

_id: String
groupId: String // id of the group this was posted in
authorId: String
tagIds: [String]
latestActivity: Date // updated whenever someone comments on this post
createdAt: Date
numberOfVotes: Number
...some more...

My queries always look something like this...

Posts.find({
  groupId: {$in: [...]},
  authorId: 'xyz', // only SOMETIMES included
  tagIds: {$in: [...]}, // only SOMETIMES included
}, {
  sort: {latestActivity/createdAt/numberOfVotes: +1/-1, _id: -1}
});

So I'm always querying on the groupId, but only sometimes adding tagIds or userIds. I'm also switching out the field on which this is sorted. How would my best indexing strategy look like?

From what I've read so far here on SO, I would probably create multiple compound indices and have them always start with {groupId: 1, _id: -1} - because they are included in every query, they are good prefix candidates. Now, I'm guessing that creating a new index for every possible combination wouldn't be a good idea memory wise. Therefore, should I just keep it like that and only index groupId and _id?

Thanks.

2
  • It actually makes no sense at all to use _id as a "compound index". It's by definition "unique", and therefore no other possible field can make any difference, since once you match on _id, then that's it! As for other key combinations, then if things are commonly used they should be added to indexes. "Prefixes" really should always be both what is commonly used, and also what "reduces" the number of matches the most. As to "which" combinations you should use? That's way to broad to ask here without specifics of what your queries are actually doing. Commented Apr 19, 2016 at 11:26
  • Using _id as the final part of your compound index makes since if you are sorting by it. Especially since it doubles as a timestamp. Commented Apr 19, 2016 at 11:58

1 Answer 1

1

You are going in the right direction. With compound indexes, you want the most selective indexes on the left and the ranges on the right. {groupId: 1, _id: -1} satisfies this.

It's also important to remember that compound indexes are used when the keys are in the query from left to right. So, one compound index can cover many common scenarios. If, for example, your index was {groupId: 1, authorId:1, tagIds: 1} and your query was Posts.find({groupId: {$in: [...]},authorId: 'xyz'}), that index would get used (even though tagIds was absent). Also, Posts.find({groupId: {$in: [...]},tagIds: {$in: [...]}}) would use this index (the first and third field of the index was used, so if there isn't a more specific index found by Mongo, this index would be used) . However, Posts.find({authorId: 'xyz',tagIds: {$in: [...]}}) would not use the index because the first field in the index was missing.

Given all of that, I would suggest starting with {groupId: 1, authorId:1, tagIds: 1, _id: -1}. groupId is the only non-optional field in your queries, so it goes on the left before the optional ones. It looks like authorId is more selective than tagIds, so it should go on the left after groupId. You're sorting by _id so that should go on the right. Be sure to Analyze Query performance on the different ways you query the data. Make sure they are all choosing this index (otherwise you'll need to make more tweaks or possibly a second compound index). You could then make other indexes and force the query to use it to do some a-b testing on performance.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks alot, that really helped. Now wouldn't it make sense to add another compound index, where authorId and tagIds switched positions? Or would the extra performance benefit not make up for the memory costs?
To really answer this, you will have to try both out and compare results. Try both indexes individually first. Then try having them both on the collection at the same time. If having two compound indexes is much more performance than just having one, it's probably worth the trade-off. What I think you'll see is that one index performs great or good enough for most queries.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.