I'm not 100% sure what "load into memory" means in this context, or how it differs between
?preference=_primary . In both cases all shard copies (primary and replica) are running, so they consume a certain amount of heap, but if a shard copy (primary or replica) is never touched by any search or indexing then it probably won't be consuming any filesystem cache.
I mean heap or disk cache, any memory.
The scenario is low volume constant queries (with aggregations) against 100s-1000s of shards across 10s of nodes. The goal is to only have the minimal amount of memory being used, so other queries will benefit. Before _primary* it was possible to have near 2x the amount of memory required being used because queries could be sent to both primary and replicas. (And I get that its under 2x because of the heap that is required per shard etc etc, but easier to just say 2x.) . This we both measured and had feedback from elastic about.
Now _primary* has been removed and the suggestion is use use preference=foobar, that will do a consistent hash to the shards to be queried every time as long as you keep using preference=foobar AND the same cluster state. So it's "almost" the same as using _primary, but instead of forcing elasticsearch to use the primaries you allow it to pick which shards to use instead of have a deterministic choice.
I guess it comes down to trust, and maybe I should just trust it won't go back to how it was before, being able to cache 1/2 as much because everything was loaded 2x times.
My concern is around the language that this is cluster state based. Does creating a new index, such as we constantly do with time based indices, qualify as a cluster state change, which could then tell the algorithm to pick new shards to send queries to? Basically throwing away many caches if it doesn't pick the same shards.