How get the just enough information when manually merge the search result by shards?

Shortly, I want to manually control which shard and when the search will run on, and merge the result myself. All depends on the shards preference API. That is how I think a workaround to give the long-term-running-background search a better user experience.
Now the problem is how can I reduce the communication cost if I need to manually merge the result from all shards.

the detail is what we start discuss on Task Management · Issue #15117 · elastic/elasticsearch · GitHub :

The query will retrieve total 100 shards on 10 elastic node, each node has 10 shards. We can get the result immediately when the first shard is done. Then the second shard is done, we reduce the result to the former, send notify to the API, so the user can update their view. When all the shards are done, send notify that the task is done.
So we can execute this long-term-running task in background with less thread one shard by another, release the CPU resources for the high-priority task. And the user can update the views of long-term-running task frequently, got a better user experience.We had many long-term task which cost minutes. Now the user can only wait it done totally, I want user can always see the view updated every serval seconds.

imotov suggest me:

you don't need task manager to do that. You can do it already by simply sending 100 requests with different shards preferences and then keep track of the responses on the client side. When you get 100th response - the "task" is done. Since sort keys are returned with every result you can easily merge sort the results as they are coming in. Personally, I don't think that the complexity of such solution and overhead of communication will make it worthwhile but all technical capability are already there, so you can give it a shot.

it is right, but the communication cost is the problem now:

You are right! This can pleasure most needs I want. I have checked out the shards preferences API. The shards preferences looks only set the shards (lucene's index) where the query retrieve from. The only API which can not be merged simply is cardinality, unless with a huge communication cost. Am I right? For the others, the cost looks like fine.

@wuranbo it depends. In some use cases the communication and retrieval overhead can be overwhelming. Let's say you would like to show 25 results on the page. If you let elasticsearch to do its thing, it will only have to pull 25 results from the disk thanks to the query_then_fetch execution strategy. However, if you will execute a request against each shard separately, you will get first 25 results that you can display as soon as the first shard finishes, but eventually elasticsearch will have to pull 2500 results from the disk since each shard doesn't know which records to pull and has to pull all top 25 in case one of them should be displayed in the top 25. So, unless your intention is to retrieve everything, the overhead of populating top N results can be quite big.
Since it turns into a search strategy discussion, let's move it to https://discuss.elastic.co/ if you have any other questions.

So I moved the discuss here.Thanks imotov very much again.

Now I read the query_then_fetch strategy, found something interesting:

The request is processed in two phases. In the first phase, the query is forwarded to all involved shards. Each shard executes the search and generates a sorted list of results, local to that shard. Each shard returns just enough information to the coordinating node to allow it merge and re-sort the shard level results into a globally sorted set of results, of maximum length size.

So how can I get the just enough information when I search with specific shard? Java api of myself writing plugin can do it? Where should I start?

Thanks for guys attention. Any clue and suggestion is great.

Why? It feels like you are reinventing the wheel.

Given the method, you are going to have to incur a cost somewhere.

@warkolm the reason:

I can execute this long-term-running task in background with less thread one shard by another, release the CPU resources for the high-priority task. And the user can update the views of long-term-running task frequently, got a better user experience.We had many long-term task which cost minutes. Now the user can only wait it done totally, I want user can always see the view updated every serval seconds.

I can build a proxy before elastic search, judging use this way or not, saving each task' internal result, the web can use the internal result.

wuranbo, instead of running the same query over and over again, you can register this query as a percolator and check all updated and new records against this percolator. If a new record matches a percolator you can send this record to the user who registered this percolator and plug the record into their search results. That would be a much more efficient way of implementing frequently updated result lists.