Nested value on function score


#1

Hi, I have a document with a nested array, and I'm trying to get a nested value in a funtion score but I can't (I always get 0).

For example having the following mapping:

{
  "mapping":{
    "movie":{
      "properties":{
        "title":{ "type":"string" },
        "reviews":{
          "type":"nested",
          "properties":{
            "rating":{ "type":"long" },
            "userId":{ "type":"string", "index":"not_analyzed" }
          }
        }
      }
    }
  }
}

I want to set the score with the rating of a particular user. To do that I execute this query:

{
  "query" : {
    "function_score" : {
      "functions" : [ {
        "filter" : {
          "nested" : {
            "filter" : { "term" : { "userId" : "1" } },
            "path" : "reviews"
          }
        },
        "script_score" : {
          "script" : "doc['reviews.rating'].value"
        }
      } ]
    }
  }
}

But that query returns _score = 0 for all records.

Is there a way to score based on a nested value?
(I'm using a function score instead of a nested sorting because I have several score functions and I took the max)

Thanks, Claudio.

Here there are some example documents:

{
   "title": "The Godfather",
   "reviews": [
      { "userId": "1", "rating": "5" },
      { "userId": "2", "rating": "4" }
   ]
}

{
   "title": "Rocky",
   "reviews": [
      { "userId": "1", "rating": "3" },
      { "userId": "2", "rating": "5" }
   ]
}

(Mike Simos) #2

You'd want to do something like this and you may want to consider using field value factor:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor

{
  "query": {
    "filtered": {
      "query": {
        "nested": {
          "path": "reviews",
          "query": {
            "function_score": {
              "query": {
                "term": {
                  "reviews.userId": {
                    "value": "1"
                  }
                }
              },
              "functions": [
                {
                  "field_value_factor": {
                    "field": "reviews.rating"
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

Function_score with nested geo_location
Field_value_factor on a nested field value
Dynamic scoring based on doc field
#3

Thanks Mike, Your query works fine

However I'm not sure how to add this in a bigger function_score query.

I mean I have a function_score query which wraps the main query (a filtered one) and have several functions with a max score_mode.
Ideally I would like to add the nested value score as a new function. But it seems that to access to tracking.rating in the function, the function_score has to be in the context of a nested query to work.

So basically the question having a function_score query, is there a way to add a new function which access to tracking.rating of a particular user, without putting the whole query inside a nested query?


#4

I was able to add this query into a bigger one using the a DisMax query: (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html)

Thanks, Claudio.


#5

Hi,

thanks for this example, is there some way to reference the nested field inside the function itself?
Because when I wrap the whole function in a nested query like this I start getting errors about matching parent documents child query must only match non-parent docs, but parent docID=2147483647 matched childScorer=class org.apache.lucene.search.ConjunctionScorer".

If I include the nested query only to the filter part, where it works correctly (I have tried replacing field_value_factor with a simple weight function), then I start getting missing field errors Missing value for field [fieldName], which makes no sense to me, because if the filter on the function matches only the correct documents, then the value must be always present.

Please can you elaborate on this an possible syntax options?

Thanks in advance!


(David Judd) #6

The syntax @msimos describes doesn't work for me with Elasticsearch 2.2. Should it? The "missing" fallback is always used, even when the score field is present in the _source returned.

Example query:

{
  "query": {
    "function_score": {
      "functions": [
        {
          "field_value_factor": {
            "field": "authors.interestingness",
            "factor": 1,
            "missing": 1
          }
        }
      ],
      "boost_mode": "multiply",
      "query": {
        "filtered": {
          "query": {
            "bool": {
              "should": [
                {
                  "query_string": {
                    "query": "Open Access",
                    "fields": [
                      "title"
                    ],
                    "boost": 1000,
                    "default_operator": "AND",
                    "allow_leading_wildcard": false
                  }
                },
                {
                  "match": {
                    "title": {
                      "query": "Open Access",
                      "type": "phrase",
                      "boost": 1000000
                    }
                  }
                },
              },
          },
          "filter": {
            "and": [
              {
                "not": {
                  "filter": {
                    "or": [
                      {
                        "term": {
                          "spam": true
                        }
                      }
                    ]
                  }
                }
              }
            ]
          }
        }
      }
    }
  },
  "_source": true,
  "explain": true
}

Schema, with some irrelevant fields snipped:

{
  "works-advanced-production-4": {
    "aliases": {
      "works-advanced-production": {}
    },
    "mappings": {
      "work": {
        "_all": {
          "enabled": false
        },
        "properties": {
          "attachments": {
            "type": "nested",
            "properties": {
              "created_at": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_millis"
              },
              "id": {
                "type": "integer"
              },
              "text": {
                "type": "string"
              }
            }
          },
          "authors": {
            "type": "nested",
            "properties": {
              "id": {
                "type": "integer"
              },
              "interestingness": {
                "type": "float"
              },
              "name": {
                "type": "string",
                "index": "not_analyzed"
              }
            }
          },
          "id": {
            "type": "integer"
          },
          "index_error": {
            "type": "boolean"
          },
          "metadata": {
            "properties": {
              "abstract": {
                "type": "string"
              },
            },
          },
          "spam": {
            "type": "boolean"
          },
          "title": {
            "type": "string"
          },
          "views": {
            "type": "integer"
          }
        }
      }
    },
    "settings": {
      "index": {
        "number_of_replicas": "2",
        "number_of_shards": "6",
      }
    },
    "warmers": {}
  }
}

(Nate Sullivan) #7

@djudd the query you posted doesn't work because the function_score query doesn't have a parent nested query. You'd have to add a parent nested query like so:

{
  "query": {
    "nested": {
      "path": "authors",
      "query": {
        "function_score": {
          "functions": [
            {
              "field_value_factor": {
                "field": "authors.interestingness",
                "factor": 1,
                "missing": 1
              }
            }
          ],
          "boost_mode": "multiply",
          "query": {
            "filtered": {
              ... same as before ...
            }
          }
        }
      }
    }
  }
}

Unfortunately, even that won't work because your filtered query (which matches against non-nested fields) will break once it's inside the nested context.

The desired behavior is:

  1. Run a query against non-nested fields.
  2. Multiply the score from that query by a nested field's value.

As far as I can tell, there's no way to implement that desired behavior until this feature request is completed.

(Like it says in that issue, if you wanted to add scores instead of multiply, you could use a bool query. And if you wanted the max, you could use dis_max.)


(Felipe Besson) #8

Hello, I have a related question that I couldn't get from the previous discussion.
I want to boost by a nested object field but a filtered one. Let's say I have this documents:

POST /test/doc/1
{
  "title": "doc 1",
  "queries": 
    [
      {
        "hash": "102a",
         "value": 0.9
      },
      {
        "hash": "101a",
       "value": 0.9
      }
    ]
}

POST /test/doc/2
{
  "title": "doc 2",
  "queries": 
    [
      {
        "hash": "102a",
         "value": 1.0
      },
      {
        "hash": "101a",
       "value": 0.8
      }
    ]
}

POST /test/doc/3
{
  "title": "doc 3",
  "queries": 
    [
      {
        "hash": "103a",
         "value": 0.9
      },
      {
        "hash": "101a",
       "value": 0.3
      }
    ]
}

I want to match all documents but boost those ones that have a specific hash value:

GET /test/doc/_search
{
    "query": {
      "nested": {
        "path": "queries",
        "query": {
          "function_score": {
            "query": { "match_all": {} },
            "functions": [
              {
                "field_value_factor": {
                  "field": "queries.value",
                  "missing": 0
                },
                "filter": {
                    "match": {"queries.hash": "102a"}
                }
              }
            ]
          }
        }
      }
    }
}

With this query I expected doc 2, doc1 and doc 3 were returned in this order, given I am using the queries.value for the hash value 102a. However, the result is not that. I have a different order and a constant score of 1 for each document.

Could you please point any mistakes in my query or tell me if is not possible to have this with nested documents in function scores.

thank you!


Unable to retrieve field value from Doc Values in nested function_score query
(system) #9