1

I have a boolean array which represents store-availability of retail products across 3000 different stores.

so, my schema looks like below:

product_id = FieldSchema(
    name="product_id", 
    dtype=DataType.INT64, 
    is_primary=True,
    auto_id=False
)

product_title_vector = FieldSchema(
    name="product_title_vector", 
    dtype=DataType.FLOAT_VECTOR,
    dim=768,
)

store_availability = FieldSchema(
    name="store_availability", 
    dtype=DataType.ARRAY, 
    element_type=DataType.BOOL,
    max_capacity=3000
)

The catch here is as per the general rule, boolean array of size 3000 will take ~3KB per record.

I have 5 million items/records. so in total store_availability field alone will take 15GB as per the below calculation.

memory = 3000 bytes/item * 5,000,000 items = 15GB

And then I perform a milvus search with expr for filtering results based on given store-id like below:

query_vector = np.random.random((1, 128)).astype(np.float32)
query_vector_resized = np.resize(query_vector, (1, 768))

store_id = 1

filter_expr = "store_availability[{}] == true".format(store_id - 1)
print("Filter Expression : ", filter_expr)

search_params = {"metric_type": "L2", "params": {"nprobe": 16}}
search_results = collection.search(
    data=query_vector_resized,
    anns_field="product_title_vector",
    expr=filter_expr,
    param=search_params,
    limit=1
    
)

I am finding this 15GB for a single metadata field as resource-intensive. Although it is better performant, what if the number of stores increase to 40,000 in future.

I tried sparse-vector representation but most of the products are available on 80% of the stores and still it consumes a lot of space.

Any approaches that can be suggested for making this space-efficient while having filter-expr within search will be of great help!

2
  • For 3,000 stores, you should be able to store that information in 325 bytes. No idea how that can be represented in your database and implemented in your filtering though. Maybe bitwise operations? Commented May 30, 2024 at 9:05
  • 1
    ok, when you say 325 bytes, I assume it is through bitwise. Please correct me if I misunderstood. Commented Jun 3, 2024 at 12:11

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.