How to maintain product availability in Elasticsearch?

Hey everyone,

I'm setting up an eCommerce search system on Elasticsearch with about 6 million product listings across 3000 stores. I need advice on storing availability data efficiently within ES. This data updates frequently, about 100-200 updates per second, and I'm considering bulk ingestion to handle it. Storing availability within ES is crucial because we use pagination, and checking availability outside ES might lead to mismatches in the number of documents per page.

The search scenario involves looking for products like "iphone 15 pro" using text matching and filtering based on metadata like category, color, etc. I aim to display only available products, whether they're available in at least one location or specifically at a certain store (e.g., store_id 123).

To address this, storing all storeIds where a product is available as an array seems simple but would consume 40Gb of storage.

{
  product_name: <String>,
  product_id: <Long>,
  availability: [123,234,432,3425,321 ....] // array length can be upto 3000
}

I'm exploring alternatives like using one hot encoding or bitset, where each array-index represents a store Id, and the set/unset bit indicates availability. However, I'm unsure how to formulate queries at runtime with this method.

{
  product_name: <String>,
  product_id: <Long>,
  availability: [0,0,1,1,1,0,1,0,0 ....]
                 | | | | | | | | |
                 0,1,2,3,4,5,6,7,8  <- store ids from 1-3000
}

I'd really appreciate any insights or optimal solutions—whether it's altering data storage or any other approach I might not be aware of. Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.