We have a search index consisting of 206735 web pages (7gb) stored on a Found 2 node cluster with 5 shards. Our _id is the url of the page, so it seems like each domain will get routed to the same shard?
Will this cause us problems with relevancy scoring? I don't fully understand the sparse statistics problem yet.
Should we be using custom routing to shuffle documents across shards more, and/or use less shards?