I have 2 documents which match my filter, and both have an identical value in the field being queried, but yet they yield vastly different scores.
Here is the returned result:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.72951484,
"hits" : [
{
"_shard" : "[taskassignment][2]",
"_node" : "yCeD_OyyQqqbBRoMgqP_ng",
"_index" : "taskassignment",
"_type" : "_doc",
"_id" : "0536f1edb103480f9d7917fdb29a2f09",
"_score" : 0.72951484,
"_source" : {
"tenantSlug" : "0536f1edb103480f9d7917fdb29a2f09",
"project" : {
"name" : "asd",
},
},
"_explanation" : {
"value" : 0.72951484,
"description" : "sum of:",
"details" : [
{
"value" : 0.72951484,
"description" : "weight(project.name.ngram:a in 11) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.72951484,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.72951484,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 13,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 27,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 5.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 5.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 0.0,
"description" : "match on required clause, product of:",
"details" : [
{
"value" : 0.0,
"description" : "# clause",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "tenantSlug:0536f1edb103480f9d7917fdb29a2f09",
"details" : [ ]
}
]
}
]
}
},
{
"_shard" : "[taskassignment][3]",
"_node" : "FmUxDSnbT8qvwSkPtC3Agg",
"_index" : "taskassignment",
"_type" : "_doc",
"_id" : "9536f1edb102480f9d7117fdb29a2faa",
"_score" : 0.3276874,
"_source" : {
"tenantSlug" : "0536f1edb103480f9d7917fdb29a2f09",
"project" : {
"name" : "asd",
},
"task" : {
"name" : "vbnt",
},
},
"_explanation" : {
"value" : 0.3276874,
"description" : "sum of:",
"details" : [
{
"value" : 0.3276874,
"description" : "weight(project.name.ngram:a in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.3276874,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.3276874,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 24,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 33,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 5.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 5.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 0.0,
"description" : "match on required clause, product of:",
"details" : [
{
"value" : 0.0,
"description" : "# clause",
"details" : [ ]
},
{
"value" : 1.0,
"description" : "tenantSlug:0536f1edb103480f9d7917fdb29a2f09",
"details" : [ ]
}
]
}
]
}
}
]
}
}
This is the query I ran:
GET /taskassignment/_search
{
"explain": true,
"query": {
"bool": {
"must": {
"match": { "project.name.ngram": "a" }
},
"filter": {
"term": { "tenantSlug": "0536f1edb103480f9d7917fdb29a2f09"}
}
}
}
}
This is my mappings/settings:
{
"taskassignment" : {
"mappings" : {
"properties" : {
"project" : {
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"ngram" : {
"type" : "text",
"analyzer" : "ngram"
}
}
},
}
},
"tenantSlug" : {
"type" : "keyword",
"ignore_above" : 256
},
}
},
"settings" : {
"index" : {
"analysis" : {
"analyzer" : {
"ngram" : {
"filter" : [ "lowercase" ],
"tokenizer" : "ngram"
}
},
"tokenizer" : {
"ngram" : {
"token_chars" : [
"letter",
"digit"
],
"min_gram" : "1",
"type" : "ngram",
"max_gram" : "2"
}
}
},
}
}
}
}
From what I can tell, it's detecting different document counts for the idf calculation for different records within the same query....how is this possible? Like, do I understand it correctly, that's counting the number of documents that have the letter 'a' in the project.name field, right? Is that document count supposed to be of all the documents which match my filter?...or of all the documents in the index?....neither seem accurate.....or all the documents in the shard? (plausible). is it possible to disable the idf calculation? In my use-case I think it will cause more problems than its worth...