We have a single search across multiple indices using simple_query_string
. The problem is that the ranking is pretty awful, and the top few results are always from a particular index. I've been told this is because tf/idf is calculated for each individual index and then ranking is determined between indices, but this makes indices with lower idf to show up higher.
Looking at the documentation, https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch computes cross-shard idf and uses that for ranking. The docs for that say to not use it for production. I assume this also meant it computes idf across index as well.
These indices have a set of 3-5 fields that are common (like inheritance), and a bunch of other specific fields.
One option is to just put everything in one index and leave fields blank where they don't apply.
The other option is to use dfs_query_then_fetch.
Which is recommended, and which is the usual solution people use in cases like this?