There are 2 items here.
- Why I need/want this.
- Can an API provide this?
What I'm asking for is #2. It seems very easy to provide this. And I don't see the downside. Maybe I'm overlooking something that might be bad to expose such info.
As for #1, I can see why people are interested. It's complicated since I'm trying to prove a theory. If my theory is correct, it could fail proof our scaling going forward.
I try not to write an essay here, so please bare with me that I'm just going to summarize why I want this.
This is tied to a question I posted here: Question regarding size of bulk action.
If I have a cluster of 100 data nodes. My biggest index shard is also 100. How efficient is bulk write?
If I can know the final destination shard, I can optimize my bulk write to say just 5 shards and having 20 parallel bulk writing tasks.
As to @warkojm's question, I don't want to be responsible for shard balancing. That's not the point. I just need the shard number for updating a document primarily. So when I read a doc, it would be nice to also include the shard # in the payload.
For document insertion, all I need is to randomly generated the _id and the new API I'm requesting could tell me the destination shard. Then I can forward the doc to the appropriate ES writing task, etc.
So my assumption is when our cluster grows to above 50 data nodes, bulkwrite is very inefficient within the ES cluster. There are too many sockets writing small payloads. Insertion queue overflow will become an issue. We have few large indices that have the same shard number as the data nodes.
This is also another question of mine on how can folks have cluster size of multiple hundred of data nodes. I can see large ES cluster serving many low shard count indices. But not large indices.
We started to encounter queue overflow several months ago. We ended up solving it by growing the bulkWrite payload size. But the down side is the writer task's RAM is getting pressure (OOM exceptions). So there's a balancing act we are doing here. Eventually, there's a limit on how large the payload a bulkwrite can support.
Am I wrong here?
What I really want is for ES to handle this internally.
If the ingestion only nodes can buffer the bulkwrite in local disk by sorting destination shards, then I don't need to worry about this. The only downside I can think of is increased index time. Data won't get lost since it's stored transiently in ingestion nodes. I'm guessing it's too much to ask for; therefore, exposing the shard number should be a good alternative and it seems attainable.