Field_value_factor on nested field with dynamic mapping


(Bart) #1

I'm trying to boost the _score of documents by using the value (double) that is present in a dynamic field in a nested structure.

This is (part of) my mapping:

    {
  "_doc": {
            ...
    "dynamic": "strict",
    "properties": {
      "raw": {
        "type": "keyword",
        "index": false,
        "ignore_above": 0
      },
      "indexed": {
        "type": "text",
        "term_vector": "yes",
        "analyzer": "edge_ngram_analyzer",
        "search_analyzer": "standard",
        "fields": {
          "raw": {
            "type": "keyword",
            "index": true
          }
        }
      },
      ...
      "data": {
        "type": "nested",
        "dynamic": "strict",
        "properties": {
          "key": {"type": "keyword"},
          "name": {"type": "keyword"},
          "type": {"type": "keyword", "index": false},
          "value_string": {"type": "keyword"},
          "value_double": {"type": "double"},
          "value_boolean": {"type": "boolean"},
          "value_date": {"type": "date"}
        }
      },
      "filtered": {
        "type": "nested",
        "dynamic": "strict",
        "properties": {
          "key": {"type": "keyword"},
          "name": {"type": "keyword"},
          "value": {
            "type": "text",
            "analyzer": "folding_analyzer"
          },
          "raw": {
            "type": "text",
            "analyzer": "kw_lowercase_analyzer"
          }
        }
      }
    }
  }
}

The field is available in the nested data structure.

Here's a part of (due to char limits on this forum) example doc:

...
"_source":{
    ...
    "data":[
        {
            "key":"84f2c",
            "name":"description",
            "value_string":"Foo bar",
            "type":"string"
        },
        ...
        {
            "key":"672c5",
            "name":"views",
            "value_double":18,
            "type":"double"
        }
    ],
    "filtered":[
        {
            "key": "84f2c",
            "name": "description",
            "value": "foo bar",
            "raw": "foo bar"
        },
        ...
    ],
    "indexed":"foo bar",
    ...
}
...

The goal is to factor in views (=18 in this example) as a boosting value (through log1p for example to smoothen out big values)

Here's my (simplified) query:

{
"query": {
  "function_score": {
    "query": {
      "bool": {
        "filter": [
          {
            "term": {
              "collection_id": "5bf6f8c51cd759010c0e70d4"
            }
          }
        ],
        "must": [
          {
            "dis_max": {
              "queries": [
                {
                  "match": {
                    "indexed": {
                      "boost": 1,
                      "zero_terms_query": "all",
                      "query": "foo",
                      "minimum_should_match": "75%"
                    }
                  }
                },
                {
                  "nested": {
                    "path": "filtered",
                    "query": {
                      "bool": {
                        "must": [
                          {
                            "term": {
                              "filtered.name": {
                                "value": "description"
                              }
                            }
                          },
                          {
                            "match": {
                              "filtered.value": {
                                "query": "foo",
                                "boost": 6,
                                "fuzziness": 0
                              }
                            }
                          }
                        ]
                      }
                    }
                  }
                },
                {
                  "nested": {
                    "path": "filtered",
                    "query": {
                      "term": {
                        "filtered.raw": {
                          "value": "foo",
                          "boost": 10
                        }
                      }
                    }
                  }
                }
              ],
              "tie_breaker": 0.3
            }
          }
        ]
      }
    },
    "functions": [
      {
        "field_value_factor": {
          "field": "data.value_double",
          "factor": 2,
          "modifier": "None",
          "missing": 1
        },
        "filter": {
          "nested": {
            "path": "data",
            "query": {
              "term": {
                "data.key": "672c5"
              }
            }
          }
        }
      }
    ],
    "boost_mode": "replace"
  }
},
"_source": {
  "includes": [
    "data"
  ]
},
"track_scores": true

}

The query does a dis_max on three (in this example) queries:

  • A query on the indexed field (a combined field that is the result of concatenation of multiple fields and has the most analyzers on it)
  • A bool query on a specific field with a match on the value of that nested document with some relative simple analyzers (filtered)
  • A big boost if we find a exact match in the raw part of a field

The gist is in the functions part of the query.

(I've used boost_mode=replace as debug to quickly see if the document is getting a score that is equal to the field)

The resulting docs aren't getting the score of the field but uses the missing/fallback value (=1). I'm suspecting that or the filter isn't doing it's job or that the reference to that specific field by using data.value_double is failing.

How can I construct the function in such a way that it can capture the value that is coming out of data.value_double and use it as a scoring factor?


(Bart) #2

I figured out a small workaround with a script. I think it's not the most efficient way, but it works fine for my use case.

Added script to functions:

 "script": {
        "lang": "painless",
        "source": "
            double boost_factor = 1;
            if (params._source.containsKey('data')) {
                for (item in params._source.data) {
                    if (item.key == params.field_key) {
                        boost_factor = Math.log10(item[params.field_type] + 2);
                        break;
                    }
                }
            }
            return boost_factor;
        ",
        "params": {
          "field_key": "fd393",
          "field_type": "value_double",
        }
      }

I think it's not efficient enough if this is executed a lot. This is due to the fact that we have to iterate over all nested data fields.

Still open to have a more 'native' way with using the field_value_factor!


(Bart) #3

Unfortunately I didn't find any other way to add a field_value_factor to a nested dynamic field... Any help is appreciated!