This article on tuning approximate kNN states "You should ensure that data nodes have at least enough RAM to hold the vector data and index structures".
Does this still apply when doing filtered kNN search?
To be clear: if filtering is being used to significantly reduce the amount of data being queried at any one time, would we still need enough RAM to hold the entirety of the data in memory?
My guess - yes, still applies. Since shards have segments of HSNW graphs. And those graphs need to be in memory in their entirety for efficient querying. Even if we're running filtered queries.