Hi everyone,
We develop an e-commerce search on Elasticsearch 8.x (Java client). Three indices: product, category, brand. ~50k products/day arrive via API (multiple times/day).
We want to add/update docs; remove products missing from the new dataset. If a field disappears in the source, it should be removed in ES. We need zero downtime / consistent reads during refresh. We can add Redis if a version flag helps
We have found two approaches:
-
Blue/Green with alias : Create a new index (same mappings/settings), bulk index the full new dataset, then atomically swap the read alias; drop the old index afterward.
Questions: pitfalls with alias swap & long-running PIT/scrolls? mapping changes between versions? -
Single index + version field : Ingest new dataset with a higher version (store in each doc; app reads the current_version from Redis and adds a filter); later delete-by-query older versions.
Questions: prefer productId#version as _id to keep both versions, or overwrite same _id and rely on full upsert? any gotchas with DBQ at this scale?
Which approach would you pick and why for this volume and update pattern? Any bad stories or pitfalls to watch out for are very welcome.
Thank you.