2

I have a MongoDB database in which there will be 5 to 10 inserts per day, each day. The structure of the data that will be inserted looks like this:

{
    question: 'text here', 
    date: '01/01/2000 01:01',
    title: 'Some title',
    client: 'name',
    assigned_to: ['name1', 'name2', 'name3'],
    answers: [
        {answer: 'bla bla'}, 
        {answer: 'bla bla'}, 
        {answer: 'bla bla'}
    ]
}

I need to search for a word or a serie of words in all the text fields (question, title, and all the answers). I have been searching and this is what I have found so far. There are 3 solutions:

a) $regexp 
b) Enable full-text search in MongoDB and use it
c) Save the structure with the following format (and then use multi-key search)

{
    question: 'text here', 
    question_s: ['text', 'here'],
    date: '01/01/2000 01:01',
    title: 'Some title',
    title_s: ['Some', 'title'],
    client: 'name',
    assigned_to: ['name1', 'name2', 'name3'],
    answers: [
        {answer: 'bla bla', answer_s: ['bla', 'bla']}, 
        {answer: 'bla bla', answer_s: ['bla', 'bla']}, 
        {answer: 'bla bla', answer_s: ['bla', 'bla']}
    ]
}

Knowing the exact format of my data and how big it will be (estimated for the next 10 years), which one of those 3 is better in terms of speed and usability? (considering also the time/brain pain each one of those solutions requieres, as setup, configuration, etc)

1 Answer 1

1

Second of course is much better for speed, especially with indexing. First one is much better for usability, as just simple RegExp can be used.

Another option would be is to have another collection of all word, and then add to it words as _id, and have array of _id's of items (questions in your case). So that way it is less space for storage, and external collection is responsible for search. Will make per word search easier, as well will enable to search by this RegExp: ^someText, which is beginning of string - that will use indexing as well.

The negative part here is that you need appropriate system that will make sure that words collection is updated and consistent with actual items (questions in your case). But once it is done, that will be fast and easy to use, as well will be able to return multiple search results with good performance regardless of size of collection with words as it will use indexing.

Another problem might start, if words collection index data will be too large and will not fit into RAM, then it will move this indexing data into file - and that will generally slow down writes and reads.
But for that you need millions and more of records, and then you can consider database clustering for words collection splitting by first letter for example.

Sign up to request clarification or add additional context in comments.

2 Comments

so, the best thing I could do without actually brain-hurting myself and getting into a configure-madness is the second option, right? Btw, do you know when full-text search will be considered stable for production?
Search is always very case specific. And full text search algorithms already is a popular topic and there is a lot of data online regarding different implementations. It is still complex task.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.