How to Create a Case-Insensitive Unique Compound Index in MongoDB?

Question

Question Body: I'm trying to create a compound unique text index in MongoDB with the following fields:

tenant_id: 1 seller_id: 1 seller_sku_code: text However, I'm encountering an error when trying to create the index: Index build failed: 4df81a22-042f-4e51-bbec-b0f8a7dafe2e: Collection product_db.skus ( 83cc8978-b0fe-4c4b-ae58-691e32db7f95 ) :: caused by :: E11000 duplicate key error collection: product_db.skus index: tenant_id_1_seller_id_1_seller_sku_code_text dup key: { tenant_id: "1", seller_id: "113", _fts: "acc", _ftsx: 0.75 }

I need to ensure that the seller_sku_code is unique within the database and is case-insensitive. For example:

"ABC" and "abc" should not be allowed simultaneously. "ACC-2001" and "ACC-2000" should be allowed to coexist.

How can I create this unique, case-insensitive index on seller_sku_code while still ensuring the uniqueness of the tenant_id and seller_id fields?

Joe · Accepted Answer · 2024-05-15 15:50:06Z

2

Text index is probably not what you are looking for.

"ACC-2001" and "ACC-2000" are strings, and contain some alphanumeric characters, but they are not text.

To ensure that no two documents in the entire database have the same seller_sku_code, create a unique index on just that field, which a case-insensitive collation, like:

db.collection.createIndex(
     {seller_sku_code:1},
     {
       unique:true,  
       collation:
         {
            locale : "en_US"
            strength : 1
         }
      }
)

Use the locale that is appropriate, and strength of 1 or 2 for case-insensitive.

The problem you were encountering with the index build is the text index tokenizer sees the dash - as a separator, so instead of indexing the values "ACC-2001" it was indexing the values "ACC" and "2001". That would prevent any other document with the same tenant and seller_id from having a seller_sku_code starting with "ACC-".

answered May 15, 2024 at 15:50

Joe

29k4 gold badges25 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

WeDoTheBest4You Over a year ago

agree with you. The issue is due to the tokenisation leading to duplication. It should not be tokenised, for that reason simple index should used instead. I am withdrawing my incorrect answer. Thanks for the corrections.

Collectives™ on Stack Overflow

How to Create a Case-Insensitive Unique Compound Index in MongoDB?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related