Our system requires that we reindex potentially 1m+ records several times per day.
To improve indexing time, we set the replicas to 0 while indexing... then, after the indexing is done, we call a flush and then set the replicas to 1.
When the new replica is created, it causes Marvel to show the index in stage "TRANSLOG" from one node to another. This lasts for some time, often over an hour.. causing the cluster to be yellow during the process.
I'm confused as to why the replica needs to play back the transaction log to be generated... the index hasn't changed when the replica is made. (it hasn't even been made an active index in our production system at the time the replica is added).
Is this the normal process for creating a replica? If so, is there some more efficient process that can be used?
Our cluster is basic: 4 proc VMs w/ 12GB RAM (6 allocated to JVM) and SAN storage (so it isn't as fast as SSDs or local 15k storage ~100MB/s throughput ). However, we can't do much about the setup we have.
That's not the reason why it's yellow, it's yellow because there are replica shards in a non-STARTED state.
When we index a document it gets sent to the primary shard, then the entire document is then sent to the replica shard and reindexed from scratch. We do not simply send the indexed outcome from the primary to the replica.
When you add a new replica, we create that using the translog, as it holds the complete action that was done on the document.
So to confirm, what you're saying is that this is a normal process to be expected when adding a replica to an existing index, and there isn't any way to make it more efficient?
It is confusing that we have seen a better result by simply indexing with a refresh of -1 but a replica count of 1.
This adds about 10% to the overall indexing time but avoids the translog state and the cluster more quickly goes green, completing the replication.
I don't understand why the translog state of adding a replica takes longer than the initial indexing.
We noted about 1mm records indexed in 18 minutes (bulk) but the similar translog process takes almost 2 hours.