ES Version: 6.8
I have an index storing documents that contain a list of nested tag
objects. Each tag
object has a text field (for the tag itself) and a float field representing a weight
that describes the strength of association between the tag and the outer document and is thus useful for scoring.
I need to create a query that returns documents with matching tags and scores them based on the weights of all matching tags.
What I have so far:
Index:
{
"nested-scoring-test" : {
"mappings" : {
"record" : {
"properties" : {
"tags" : {
"type" : "nested",
"properties" : {
"tag" : {
"type" : "text"
},
"weight" : {
"type" : "float"
}
}
},
"name" : {
"type" : "text"
}
}
}
}
}
}
Test Document
{
"name": "Test",
"tags": [
{
"tag": "Captain Falcon Only",
"weight": 0.5
},
{
"tag": "Captain Kirk Only",
"weight": 0.25
},
{
"tag": "Falcon Punch Only",
"weight": 0.1
}
]
}
Query So Far:
{
"query": {
"nested": {
"path": "tag",
"query": {
"function_score": {
"query": {
"constant_score": {
"filter": {
"match": {
"tag.tag": "Only"
}
}
}
},
"functions": [
{
"field_value_factor": {
"field": "tag.weight",
"factor": 1
}
}
],
"boost_mode": "replace"
}
}
}
}
}
For this query, I would expect the returned score to be 0.85. However, it's actually 0.28333333. In fact, the returned score is actually lower (0.375) than if I would try to match the word "Captain" which only has two matching tags instead of three. This makes me think TF_IDF is getting involved even though I'm trying to set my scores absolutely anyway.
The explain is also not super helpful:
"explanation" : {
"value" : 0.28333333,
"description" : "sum of:",
"details" : [ {
"value" : 0.28333333,
"description" : "Score based on 3 child docs in range from 0 to 2, best match:",
"details" : [ {
"value" : 0.5,
"description" : "sum of:",
"details" : [ {
"value" : 0.5,
"description" : "min of:",
"details" : [ {
"value" : 0.5,
"description" : "field value function: none(doc['tag.weight'].value * factor=1.0)",
"details" : [ ]
}, {
"value" : 3.4028235E38,
"description" : "maxBoost",
"details" : [ ]
} ]
}, {
"value" : 0.0,
"description" : "match on required clause, product of:",
"details" : [ {
"value" : 0.0,
"description" : "# clause",
"details" : [ ]
}, {
"value" : 1.0,
"description" : "_type:__tags",
"details" : [ ]
} ]
} ]
} ]
}
}
You can see the step of "Score based on 3 child docs..." is seemingly magic and probably where TF-IDF gets involved.
However, I would be fine with these results but the TF-IDF seems to be localized to the outer document. Case in point, if I add another document that looks like this:
{
"name": "Test Again",
"tags": [
{
"tag": "Officer Eddie Only",
"weight": 0.5
},
{
"tag": "Officer Ward Excluded",
"weight": 0.25
},
{
"tag": "Officer Earhart Excluded",
"weight": 0.1
}
]
}
The original query will return it as the top result with a score of 0.5.
How can I modify my query so that my scoring works as I expect?