I have a question regarding shard selection during index/bulk operations in Elasticsearch version 6.8.6.
In my cluster, I have three data nodes: A, B, and C. The shards (with no replicas) are evenly allocated across these nodes, with shards 0, 1, and 2 assigned to each node respectively.
I've noticed that when any of the data nodes crash, all the index requests are still being routed to the target shard/node, causing a timeout exception. It seems that the indexRequest route policy does not exclude the crashed node/shard from the selection process.
I would like to confirm the following:
Is my observation and analysis correct?
Why does the default route policy include the crashed node in the routing selection? Is it because Elasticsearch aims to maintain consistent route results?
Is there any way to route the document to an available shard?
I appreciate any insights or suggestions you can provide. Thank you.
But I have try and confirm I do index into a red state index. (but only into online shards ofcause)
And the primary reason for opting to maintain zero replicas is the cost of resources. We are not concerned if some nodes experience crashes (typically, they recover quickly). During a node crash, the only operation affected is querying the offline shard, which is acceptable in our scenario. However, what troubles us is that all index requests routed to the offline shard will wait until timeout, significantly reducing the transactions per second (TPS), particularly for bulk requests. A single index timeout within the bulk request will cause the entire operation to wait until a timeout reply.
In conclusion, it appears sensible to exclusively select online shards as the index target when employing automatically generated document IDs. As you mentioned, the document ID influences shard and node selection. We can explore ways to ensure that automatically generated document IDs exclusively reference online shards.
You can override the hashing of the document ID determining the shard to write to through document routing, which is probably easier to use rather than try to assign appropriate IDs client side. This also avoids the negative impact using custom IDs have on indexing performance.
I have modified the default timeout to 10s which still had a considerable impact to TPS while the normal request will cost less than 1s. And the situation will become worse in bulk since any single index latency will cause the whole bulk to wait until timeout.
It seems I have to change the default pattern by rewrite the route code.
Indeed, we are faced with a trade-off between the advantages of low cost and the potential risk of data loss, versus the higher cost (at least doubling) that comes with maintaining replicas. However, considering the exorbitant expense associated with doubling the cost, we must opt for the first option. In our environment, we can accept a minimal loss of data. I believe this is a reasonable and widely shared requirement for certain scenarios.
Whatever, thank you for your understanding and reply.
PS. do you guys thinks it is appropriate to propose a feature request issue regarding this matter?