_score higher than suspected

Hi @kimchy :smile:

I was waiting for the moment I will need your help and it's today :smiley: .
I have tags, with this mapping:

        "tag": {
        "_all": {
            "index_analyzer": "nGram_analyzer",
            "search_analyzer": "whitespace_analyzer"
        },
        "properties": {
            "name": {
                "type": "string",
                "index_analyzer": "nGram_analyzer",
                "search_analyzer": "whitespace_analyzer"
            },
        }
    }

and now. I'm searching in this way:

{
   "sort":[
      "_score"
   ],
   "query":{
      "bool":{
         "must":[
            {
               "match":{
                  "name":{
                     "operator":"and",
                     "query":"blue"
                  }
               }
            }
         ]
      }
   },
   "size":20
}

What I have in result? When searching 20 items I have 20 other words containing blue, for example bluesea, bluesky, but not blue. What could it be that blue hasn't higher score than words containing blue?

I'm not Shay, but I might be able to help =)

Hard to say without seeing the documents, but there is more to scoring than just token matching. For example, the length of the field is taken into account, as well as the individual term and doc frequency. It's likely that some of those ngram fragments are matching other parts of the document and contributing to a higher score.

If you add explain: true to your query, you'll get a dump of how Lucene calculated the score. It is pretty verbose, but not too terrible to read. If you gist it up, I can take a look.

@polyfractal Yes, you're right, now I see. Wondering how can I determine based on what _score should be calculated?

Not sure I understand your question?

In general, the actual score generated by Lucene is relatively meaningless. One query may return results from 0-1, another may return results from 0-0.03, and another 0-100. You can't really compare scores. Just think of them as the relative ranking for documents returned by the search.

I was thinking, is it possible to define how _score should be calculated.

Ah, I see. Yes, you can modulate the score in a number of ways. I'd recommend reading through the Controlling Relevancy portion of the Definitive Guide, which outlines a bunch of ways to modulate or override the score.

Thanks.