To be a little less hand-wavy (please correct me if I'm wrong): some
stats used in the scoring, like IDF, are computed per shard, by
default. These stats are effectively computed only from the document
set present in that one shard. This means that the same document can
be scored differently, depending on which shard it ends up in.
By changing the search-type, you can change this behaviour so that the
stats are computed on index-level (not shard-level), i.e. from the
document set present in the entire index. This helps to score
consistently within one index.
AFAIK there is no way to run cross-index queries accurately. You can
rely on the "evening out" that Clinton mentions. In that case you need
to be careful your routing doesn't skew the stats distribution too
much -- if each shard receives very different data, then the stats
will never even out. The default routing is fine, as it sends out
documents to random shards evenly (using hash of the id field).
On Jul 20, 10:20 am, Clinton Gormley cl...@traveljury.com wrote:
On Thu, 2012-07-19 at 10:32 -0700, Ivan Brusic wrote:
Scoring might be different due to the distributed nature of
ElasticSearch. Try adjusting the search type:
There is a tradeoff between performance and accuracy of scoring.
Also, as the quantity of data you have grows, these differences tend to