1

Okay, so I have a collection full of folks' emails. I want to efficiently look up the domain of the email without altering the existing data.

Currently I can look up the username SUPER fast since it's a regex scan that is prefix based, my collection is about 1GB+ in size and my server isn't super powerful. I do have an index on "Email". The fast query is something like:

db.emails.find({"Email": {'$regex':'^johnsmith'}})

My index is simple and looks like this: db.emails.createIndex({ Email: 1 })

However, if I try to find the domain, I have to use a non-prefixed regex scan like this, but this results in a query that still uses the index, but takes about 10-20x more time:

db.emails.find({"Email": {'$regex':'sampledomain.com'}})

I've tried using a suffix instead of a prefix like this, but still just as slow:

db.emails.find({"Email": {'$regex':'sampledomain.com&'}})

I'm not sure if there's some sort of index I can make just on the domain portion of the email, but I'm pretty new to mongoDB so any advice would be appreciated.

5
  • Have you thought about using a text index? Commented Jun 18, 2020 at 13:59
  • Hmmm... is that different than a normal index? let me try it :D Commented Jun 18, 2020 at 14:00
  • So I looked it up and it sounds like text indexes are better for sentences/words, but a regular index is better for emails/usernames, so not sure it would help? Let me know if I'm wrong though. "A text index on the other hand will tokenize and stem the content of the field. So it will break the string into individual words or tokens, and will further reduce them to their stems so that variants of the same word will match ("talk" matching "talks", "talked" and "talking" for example, as "talk" is a stem of all three). Mostly useful for true text (sentences, paragraphs, etc)." Commented Jun 18, 2020 at 14:02
  • I think text index will perform better here but it won't find stuff johnsmith = johnsmithjr for example where regex would. Commented Jun 18, 2020 at 14:05
  • I will test it and see if it's better of an option. I do kind of need the wildcard for the domains/usernames though so hopefully there is another option that is more efficient with regex! Commented Jun 18, 2020 at 14:07

1 Answer 1

1

If you are searching from the beginning of domain, you can extract the domain and store it in another field, at which point you'd be able to use prefix regexp match on that field.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.