Multiple Nested Object Function Scoring

ES Version: 6.8

I have an index storing documents that contain a list of nested tag objects. Each tag object has a text field (for the tag itself) and a float field representing a weight that describes the strength of association between the tag and the outer document and is thus useful for scoring.

I need to create a query that returns documents with matching tags and scores them based on the weights of all matching tags.

What I have so far:

Index:

{
  "nested-scoring-test" : {
    "mappings" : {
      "record" : {
        "properties" : {
          "tags" : {
            "type" : "nested",
            "properties" : {
              "tag" : {
                "type" : "text"
              },
              "weight" : {
                "type" : "float"
              }
            }
          },
          "name" : {
            "type" : "text"
          }
        }
      }
    }
  }
}
Test Document
{
  "name": "Test",
  "tags": [
    {
      "tag": "Captain Falcon Only",
      "weight": 0.5
    },
    {
      "tag": "Captain Kirk Only",
      "weight": 0.25
    },
    {
      "tag": "Falcon Punch Only",
      "weight": 0.1
    }
  ]
}
Query So Far:
{
  "query": {
    "nested": {
      "path": "tag",
      "query": {
        "function_score": {
          "query": {
            "constant_score": {
              "filter": {
                "match": {
                  "tag.tag": "Only"
                }
              }
            }
          },
          "functions": [
            {
              "field_value_factor": {
                "field": "tag.weight",
                "factor": 1
              }
            }
          ],
          "boost_mode": "replace"
        }
      }
    }
  }
}

For this query, I would expect the returned score to be 0.85. However, it's actually 0.28333333. In fact, the returned score is actually lower (0.375) than if I would try to match the word "Captain" which only has two matching tags instead of three. This makes me think TF_IDF is getting involved even though I'm trying to set my scores absolutely anyway.

The explain is also not super helpful:

"explanation" : {
    "value" : 0.28333333,
    "description" : "sum of:",
    "details" : [ {
      "value" : 0.28333333,
      "description" : "Score based on 3 child docs in range from 0 to 2, best match:",
      "details" : [ {
        "value" : 0.5,
        "description" : "sum of:",
        "details" : [ {
          "value" : 0.5,
          "description" : "min of:",
          "details" : [ {
            "value" : 0.5,
            "description" : "field value function: none(doc['tag.weight'].value * factor=1.0)",
            "details" : [ ]
          }, {
            "value" : 3.4028235E38,
            "description" : "maxBoost",
            "details" : [ ]
          } ]
        }, {
          "value" : 0.0,
          "description" : "match on required clause, product of:",
          "details" : [ {
            "value" : 0.0,
            "description" : "# clause",
            "details" : [ ]
          }, {
            "value" : 1.0,
            "description" : "_type:__tags",
            "details" : [ ]
          } ]
        } ]
      } ]
  }
}

You can see the step of "Score based on 3 child docs..." is seemingly magic and probably where TF-IDF gets involved.

However, I would be fine with these results but the TF-IDF seems to be localized to the outer document. Case in point, if I add another document that looks like this:

{
  "name": "Test Again",
  "tags": [
    {
      "tag": "Officer Eddie Only",
      "weight": 0.5
    },
    {
      "tag": "Officer Ward Excluded",
      "weight": 0.25
    },
    {
      "tag": "Officer Earhart Excluded",
      "weight": 0.1
    }
  ]
}

The original query will return it as the top result with a score of 0.5.

How can I modify my query so that my scoring works as I expect?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.