0

How to Create a Case-Insensitive Unique Compound Index in MongoDB?

Question Body: I'm trying to create a compound unique text index in MongoDB with the following fields:

tenant_id: 1 seller_id: 1 seller_sku_code: text However, I'm encountering an error when trying to create the index: Index build failed: 4df81a22-042f-4e51-bbec-b0f8a7dafe2e: Collection product_db.skus ( 83cc8978-b0fe-4c4b-ae58-691e32db7f95 ) :: caused by :: E11000 duplicate key error collection: product_db.skus index: tenant_id_1_seller_id_1_seller_sku_code_text dup key: { tenant_id: "1", seller_id: "113", _fts: "acc", _ftsx: 0.75 }

I need to ensure that the seller_sku_code is unique within the database and is case-insensitive. For example:

"ABC" and "abc" should not be allowed simultaneously. "ACC-2001" and "ACC-2000" should be allowed to coexist.

How can I create this unique, case-insensitive index on seller_sku_code while still ensuring the uniqueness of the tenant_id and seller_id fields?

1 Answer 1

2

Text index is probably not what you are looking for.

"ACC-2001" and "ACC-2000" are strings, and contain some alphanumeric characters, but they are not text.

To ensure that no two documents in the entire database have the same seller_sku_code, create a unique index on just that field, which a case-insensitive collation, like:

db.collection.createIndex(
     {seller_sku_code:1},
     {
       unique:true,  
       collation:
         {
            locale : "en_US"
            strength : 1
         }
      }
)

Use the locale that is appropriate, and strength of 1 or 2 for case-insensitive.

The problem you were encountering with the index build is the text index tokenizer sees the dash - as a separator, so instead of indexing the values "ACC-2001" it was indexing the values "ACC" and "2001". That would prevent any other document with the same tenant and seller_id from having a seller_sku_code starting with "ACC-".

Sign up to request clarification or add additional context in comments.

1 Comment

agree with you. The issue is due to the tokenisation leading to duplication. It should not be tokenised, for that reason simple index should used instead. I am withdrawing my incorrect answer. Thanks for the corrections.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.