Field_value_factor on nested field with dynamic mapping

inversive · November 28, 2018, 8:34am

I'm trying to boost the _score of documents by using the value (double) that is present in a dynamic field in a nested structure.

This is (part of) my mapping:

    {
  "_doc": {
            ...
    "dynamic": "strict",
    "properties": {
      "raw": {
        "type": "keyword",
        "index": false,
        "ignore_above": 0
      },
      "indexed": {
        "type": "text",
        "term_vector": "yes",
        "analyzer": "edge_ngram_analyzer",
        "search_analyzer": "standard",
        "fields": {
          "raw": {
            "type": "keyword",
            "index": true
          }
        }
      },
      ...
      "data": {
        "type": "nested",
        "dynamic": "strict",
        "properties": {
          "key": {"type": "keyword"},
          "name": {"type": "keyword"},
          "type": {"type": "keyword", "index": false},
          "value_string": {"type": "keyword"},
          "value_double": {"type": "double"},
          "value_boolean": {"type": "boolean"},
          "value_date": {"type": "date"}
        }
      },
      "filtered": {
        "type": "nested",
        "dynamic": "strict",
        "properties": {
          "key": {"type": "keyword"},
          "name": {"type": "keyword"},
          "value": {
            "type": "text",
            "analyzer": "folding_analyzer"
          },
          "raw": {
            "type": "text",
            "analyzer": "kw_lowercase_analyzer"
          }
        }
      }
    }
  }
}

The field is available in the nested data structure.

Here's a part of (due to char limits on this forum) example doc:

...
"_source":{
    ...
    "data":[
        {
            "key":"84f2c",
            "name":"description",
            "value_string":"Foo bar",
            "type":"string"
        },
        ...
        {
            "key":"672c5",
            "name":"views",
            "value_double":18,
            "type":"double"
        }
    ],
    "filtered":[
        {
            "key": "84f2c",
            "name": "description",
            "value": "foo bar",
            "raw": "foo bar"
        },
        ...
    ],
    "indexed":"foo bar",
    ...
}
...

The goal is to factor in views (=18 in this example) as a boosting value (through log1p for example to smoothen out big values)

Here's my (simplified) query:

{
"query": {
  "function_score": {
    "query": {
      "bool": {
        "filter": [
          {
            "term": {
              "collection_id": "5bf6f8c51cd759010c0e70d4"
            }
          }
        ],
        "must": [
          {
            "dis_max": {
              "queries": [
                {
                  "match": {
                    "indexed": {
                      "boost": 1,
                      "zero_terms_query": "all",
                      "query": "foo",
                      "minimum_should_match": "75%"
                    }
                  }
                },
                {
                  "nested": {
                    "path": "filtered",
                    "query": {
                      "bool": {
                        "must": [
                          {
                            "term": {
                              "filtered.name": {
                                "value": "description"
                              }
                            }
                          },
                          {
                            "match": {
                              "filtered.value": {
                                "query": "foo",
                                "boost": 6,
                                "fuzziness": 0
                              }
                            }
                          }
                        ]
                      }
                    }
                  }
                },
                {
                  "nested": {
                    "path": "filtered",
                    "query": {
                      "term": {
                        "filtered.raw": {
                          "value": "foo",
                          "boost": 10
                        }
                      }
                    }
                  }
                }
              ],
              "tie_breaker": 0.3
            }
          }
        ]
      }
    },
    "functions": [
      {
        "field_value_factor": {
          "field": "data.value_double",
          "factor": 2,
          "modifier": "None",
          "missing": 1
        },
        "filter": {
          "nested": {
            "path": "data",
            "query": {
              "term": {
                "data.key": "672c5"
              }
            }
          }
        }
      }
    ],
    "boost_mode": "replace"
  }
},
"_source": {
  "includes": [
    "data"
  ]
},
"track_scores": true

}

The query does a dis_max on three (in this example) queries:

A query on the indexed field (a combined field that is the result of concatenation of multiple fields and has the most analyzers on it)
A bool query on a specific field with a match on the value of that nested document with some relative simple analyzers (filtered)
A big boost if we find a exact match in the raw part of a field

The gist is in the functions part of the query.

(I've used boost_mode=replace as debug to quickly see if the document is getting a score that is equal to the field)

The resulting docs aren't getting the score of the field but uses the missing/fallback value (=1). I'm suspecting that or the filter isn't doing it's job or that the reference to that specific field by using data.value_double is failing.

How can I construct the function in such a way that it can capture the value that is coming out of data.value_double and use it as a scoring factor?

inversive · November 28, 2018, 1:30pm

I figured out a small workaround with a script. I think it's not the most efficient way, but it works fine for my use case.

Added script to functions:

 "script": {
        "lang": "painless",
        "source": "
            double boost_factor = 1;
            if (params._source.containsKey('data')) {
                for (item in params._source.data) {
                    if (item.key == params.field_key) {
                        boost_factor = Math.log10(item[params.field_type] + 2);
                        break;
                    }
                }
            }
            return boost_factor;
        ",
        "params": {
          "field_key": "fd393",
          "field_type": "value_double",
        }
      }

I think it's not efficient enough if this is executed a lot. This is due to the fact that we have to iterate over all nested data fields.

Still open to have a more 'native' way with using the field_value_factor!

inversive · December 4, 2018, 10:13am

Unfortunately I didn't find any other way to add a field_value_factor to a nested dynamic field... Any help is appreciated!

system · January 1, 2019, 10:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to disable nested data type keyword in dynamic mapping Elasticsearch	10	6847	May 3, 2017
Field_value_factor on a nested field value Elasticsearch	2	4228	July 5, 2017
Nested datatype and dynamic mapping Elasticsearch	2	3102	May 24, 2017
Field value factor with function query in nested object not found Elasticsearch	1	713	December 5, 2016
Dynamic scoring based on doc field Elasticsearch	2	1078	November 9, 2018

Field_value_factor on nested field with dynamic mapping

Related topics