_score higher than suspected

Wojciech_Rola · August 21, 2015, 7:25am

I was waiting for the moment I will need your help and it's today .
I have tags, with this mapping:

        "tag": {
        "_all": {
            "index_analyzer": "nGram_analyzer",
            "search_analyzer": "whitespace_analyzer"
        },
        "properties": {
            "name": {
                "type": "string",
                "index_analyzer": "nGram_analyzer",
                "search_analyzer": "whitespace_analyzer"
            },
        }
    }

and now. I'm searching in this way:

{
   "sort":[
      "_score"
   ],
   "query":{
      "bool":{
         "must":[
            {
               "match":{
                  "name":{
                     "operator":"and",
                     "query":"blue"
                  }
               }
            }
         ]
      }
   },
   "size":20
}

What I have in result? When searching 20 items I have 20 other words containing blue, for example bluesea, bluesky, but not blue. What could it be that blue hasn't higher score than words containing blue?

polyfractal · August 21, 2015, 11:01am

I'm not Shay, but I might be able to help =)

Hard to say without seeing the documents, but there is more to scoring than just token matching. For example, the length of the field is taken into account, as well as the individual term and doc frequency. It's likely that some of those ngram fragments are matching other parts of the document and contributing to a higher score.

If you add explain: true to your query, you'll get a dump of how Lucene calculated the score. It is pretty verbose, but not too terrible to read. If you gist it up, I can take a look.

Wojciech_Rola · August 21, 2015, 12:03pm

@polyfractal Yes, you're right, now I see. Wondering how can I determine based on what _score should be calculated?

polyfractal · August 21, 2015, 12:14pm

Not sure I understand your question?

In general, the actual score generated by Lucene is relatively meaningless. One query may return results from 0-1, another may return results from 0-0.03, and another 0-100. You can't really compare scores. Just think of them as the relative ranking for documents returned by the search.

Wojciech_Rola · August 21, 2015, 12:48pm

I was thinking, is it possible to define how _score should be calculated.

polyfractal · August 21, 2015, 1:08pm

Ah, I see. Yes, you can modulate the score in a number of ways. I'd recommend reading through the Controlling Relevancy portion of the Definitive Guide, which outlines a bunch of ways to modulate or override the score.

Wojciech_Rola · September 1, 2015, 9:06am

Thanks.

Topic		Replies	Views
Newbie elasticssearch questions Elasticsearch	5	377	July 6, 2017
Score values different than Lucene ones Elasticsearch	1	322	May 21, 2019
Scoring and boost Elasticsearch	6	2182	July 5, 2017
_score not as I'd expect Elasticsearch	3	597	December 1, 2017
Issues with scoring and query boost Elasticsearch	2	403	July 6, 2017

_score higher than suspected

Related topics