Basically, all numbers where only FOO is being matched returns low score.
Soo basically, this data can be grouped into 2 distinct clusters score. Sadly, I don't know proper terminology, but one cluster is wider and close to 3, and second has very similar score close to 0.01. Is there a way to instruct elasticsearch, that in such case, return only first cluster?
I am happy to do all the reading, as well learn/relearn required math, so all I am asking are good reads you can point me to.
Scores are computed on a number of factors, some of which vary over time as more content is added to the index. For this reason we don't suggest reading too much into what the scores mean (i.e. a score of 1 doesn't mean "perfection").
That said, if you want consider the entire range of scores produced by a query and look at their distribution the percentiles aggregation can be used to help draw that curve:
GET /MY_INDEX/_search
{
"query": {
... MY QUERY HERE ...
},
"aggs": {
"scoreDistribution": {
"percentiles": {
"script": "_score"
}
}
}
}
As you can see, most of the results has poor score, and there is huge gap. Could you recommend a method how to detect this gap (I am not afraid of math)?
Perhaps a much simpler approach is to make all query terms required using the AND operator.. In your query example that would be turned into a search for FOO AND 12345 as opposed to the default FOO OR 12345.
The details can depend on which query type and index mapping you are using.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.