Reindexing data in a cluster serving live traffic


We have a cluster today with 3 master, client and 5 data nodes which is serving live traffic
We are planning to expand the cluster to 15 data nodes because we are migrating another feature (which is pretty big with lots of data that needs to be migrated)

At first, we thought of adding the 10 extra nodes to the cluster and just ingest the data of the new "feature" through the client nodes itself. But we felt that this might affect the existing features that ES is serving because data nodes & client nodes are shared

What we then decided was that, the 10 new data nodes can be given some attribute and the new index gets allocated only to those 10 data nodes, the old features will be served using the older 5 nodes (we'll exclude the shards of these indices from getting allocated to the newer data nodes)

Now for ingestion, we'll directly hit the new data nodes and not go through the client nodes (or maybe add client nodes just for this ingestion)

Do you think this is fine? Or is this over engineering? Let me tell you, the new feature has almost 2TB of data so we're planning to ingest as fast as we can

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.