'Intelligently' cutting out results

slawosz · August 12, 2021, 11:20am

Hi,
lets say I have products with serial numbers, and format is FOO-XXXXX where FOO is always there and XXXXX are digits.

When I search for FOO-12345, the results are showing all products, for example (score in brackets):

FOO-12345 (3)
FOO-12344 (2.5)
FOO-42353 (0.01)
FOO-XXXX (0.01)

Basically, all numbers where only FOO is being matched returns low score.

Soo basically, this data can be grouped into 2 distinct clusters score. Sadly, I don't know proper terminology, but one cluster is wider and close to 3, and second has very similar score close to 0.01. Is there a way to instruct elasticsearch, that in such case, return only first cluster?
I am happy to do all the reading, as well learn/relearn required math, so all I am asking are good reads you can point me to.

Thanks

Mark_Harwood · August 12, 2021, 1:41pm

Hi Sławosz

Scores are computed on a number of factors, some of which vary over time as more content is added to the index. For this reason we don't suggest reading too much into what the scores mean (i.e. a score of 1 doesn't mean "perfection").

That said, if you want consider the entire range of scores produced by a query and look at their distribution the percentiles aggregation can be used to help draw that curve:

GET /MY_INDEX/_search
{
  "query": {
     ... MY QUERY HERE ...
  },
  "aggs": {
    "scoreDistribution": {
      "percentiles": {
        "script": "_score"        
      }
    }
  }
}

The results in my test query here look like this:

  "aggregations" : {
    "scoreDist" : {
      "values" : {
        "1.0" : 5.383362350463867,
        "5.0" : 6.5666823387146,
        "25.0" : 7.337974548339844,
        "50.0" : 8.046831130981445,
        "75.0" : 9.65035629272461,
        "95.0" : 11.354881286621094,
        "99.0" : 14.382296962738014
      }
    }
  }

slawosz · August 12, 2021, 3:04pm

Thanks Mark,
it is indeed very helpful. Its step in very good direction to potential solution.
I got something like this:

"aggregations" : {
    "scoreDistribution" : {
      "values" : {
        "1.0" : 0.015267470851540565,
        "5.0" : 0.015267470851540565,
        "25.0" : 0.015267470851540565,
        "50.0" : 0.015267470851540565,
        "75.0" : 0.015267470851540565,
        "95.0" : 2.449463472701605,
        "99.0" : 3.1063098907470703
      }
    }
  }

As you can see, most of the results has poor score, and there is huge gap. Could you recommend a method how to detect this gap (I am not afraid of math)?

Mark_Harwood · August 12, 2021, 3:27pm

Perhaps a much simpler approach is to make all query terms required using the AND operator.. In your query example that would be turned into a search for FOO AND 12345 as opposed to the default FOO OR 12345.
The details can depend on which query type and index mapping you are using.

system · September 9, 2021, 3:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Result Score descending by exact match Elasticsearch	3	631	July 6, 2017
Elastic score as percentage Elasticsearch	1	2493	July 23, 2019
The same query return different documents,why? Elasticsearch	2	492	May 17, 2017
Exclude results with a score X percent away from top score Elasticsearch	2	370	July 6, 2017
Filter out better results Elasticsearch	3	305	June 5, 2021

'Intelligently' cutting out results

Related topics