Significance of two phases - "query then fetch" with default number of shards as 1

Is there any significance for search to be executed in two phases - "query then fetch" when the default number of shards as 1 ( starting from 7.x ) ? leaving the cases of considering replicas

That is probably the only scenario where executing the query in two phases may not bring a lot of added benefits. I believe this is a quite rare scenario and likely one that generally performs quite well anyway, so I don't think it adds much overhead either. Optimizing this would thesefore likely bring very little benefit, but make the code more complex and difficult to maintain, which IMHO seems like a bad tradeoff.

well. if we see 7.x, default number of shards is '1' , this decision was taken considering majority deployments were small / deployments with defaults. On the same note, does not it make sense to disable the strategy of doing aggregation in multiple places ( at least , once in data node and once in coordinating node) in the default scenarios

I do not think you would gain much as the same amount of work still need to be done, so it would add complexity for virtually no gain. If you have a small cluster it is also likely that the node serving the request also holds the shard which means there is not even a network hop to avoid in most cases, especially if you also use suitable preference setting.

I do not think this quite theoretical discussion around a very rare case which is usually very fast anyway is very useful so will leave it.