Best option for bulk refresh: alias blue/green vs single-index + version filter

Hi everyone,

We develop an e-commerce search on Elasticsearch 8.x (Java client). Three indices: product, category, brand. ~50k products/day arrive via API (multiple times/day).

We want to add/update docs; remove products missing from the new dataset. If a field disappears in the source, it should be removed in ES. We need zero downtime / consistent reads during refresh. We can add Redis if a version flag helps

We have found two approaches:

  1. Blue/Green with alias : Create a new index (same mappings/settings), bulk index the full new dataset, then atomically swap the read alias; drop the old index afterward.
    Questions: pitfalls with alias swap & long-running PIT/scrolls? mapping changes between versions?

  2. Single index + version field : Ingest new dataset with a higher version (store in each doc; app reads the current_version from Redis and adds a filter); later delete-by-query older versions.
    Questions: prefer productId#version as _id to keep both versions, or overwrite same _id and rely on full upsert? any gotchas with DBQ at this scale?

Which approach would you pick and why for this volume and update pattern? Any bad stories or pitfalls to watch out for are very welcome.

Thank you.

Hi @alper

Big questions :slight_smile:

As you seem to already be aware its all in the details....

I can say some of the biggest eComm customer I work with use some variety of your Option 1.

One customer does Blue / Green on the Client Side as well to Drain Out / Finish all the client request on the "Old/Blue" index before the swap. I believe they actually use 2 aliases that are abstracted at the client level to they can run the Blue / Green side by side as Blue Drains out... thus 0 down time, it requires and additional layer of abstraction.

My other customer does not have longer running PIT etc... they just use the normal 1 level of abstraction i.e. 1 alias and they do the "Quick Cutover" ... Pause the Queue, let requests finish, the switchover and continue.

This is not to say option 2 is not valid ... seems valid but a lot of "bookkeeping" that could get our of sync.

I have been using some version of 1 for 20 years with RDBMS it is a pretty common approach.

Let us know where you end up.