Providing weight to individual fields


(Josh Harrison) #1

I have a nested document structure like
{"mystaff":[{"name":"bob", "value":0.375}, {"name":"fred", "value":0.8537}]}

Is it possible to use the value associated with bob or fred as a weight value in a query when determining the "best" document? That is, if I have both
{"_index":"staff", "_type":"performance", "_id":1,"_source":{"mystaff":[{"name":"bob", "value":0.375}, {"name":"fred", "value":0.8537}]}}
{"_index":"staff", "_type":"performance", "_id":2,"_source":{"mystaff":[{"name":"bob", "value":0.99743}, {"name":"fred", "value":0.5583}]}}

I'd like to be able to search for "bob" and get back /staff/performance/2 as a higher scoring record than /staff/performance/1, based on the value Bob has at mystaff.value.

Note this is an extremely simplified example - I ultimately want to build something akin to a list of performance for events for many hundreds of "staff" per "performance event", then be able to issue a seperate weighted query where I care more about how Kerry and Sue did than Bob and Fred, but I don't care how Jerry did or if he even participated. I know that'll end up being a large bool query with "should" and "must" clauses, along with weighting of individual terms, but this seems like the core approach to something like that.
That is to say, I want to query the events where overall Kerry and Sue did really well, Bob and Fred doing well is a bonus but not hugely impactful, and Jerry may or may not have participated in the event.

Edit: Is this where https://www.elastic.co/guide/en/elasticsearch/reference/2.4/query-dsl-function-score-query.html would come into play?


(Mark Walkom) #2

https://www.elastic.co/guide/en/elasticsearch/reference/2.4/query-dsl-query-string-query.html#_boosting might be more what you want.


(Josh Harrison) #3

I think I'd need something beyond just boosted - as far as I can tell, that's saying "when I am doing my query, I care more about tokens X and Y than I do Z".

The first part of what I need to do is, if I query "fred" without any specific boosting, get back values in order of document id 1, then document id 2 (because "fred" has a value of 0.8537 in doc 1, and 0.5583 in doc 2). Similarly, I want to be able to search for "bob" without any specific boosting to get back document 2, then document 1 (doc 1, bob has a value of 0.375, doc 2 bob has a value of 0.99743).

Then on top of that, I want to apply query side boosting - I want to put together a query like (I know not doing a nested query right, at the moment this is just an example) {"query":{"bool":{"should":[{"term":{"mystaff.name":"bob^0.89"}},{"term":{"mystaff.name":"fred^0.67"}}]}}}

Would this end up looking like using a bunch of decay functions in boolean statements like:

{
  "query": {
    "bool": {
      "should": [
        {
          "function_score": {
            "functions": [
              {
                "gauss": {
                  "mystaff.value": {
                    "origin": 0,
                    "scale": 1
                  }
                }
              }
            ],
            "score_mode": "multiply",
            "query": {
              "match": {
                "term": {
                  "mystaff.name": "fred^0.67"
                }
              }
            }
          }
        },
        {
          "function_score": {
            "functions": [
              {
                "gauss": {
                  "mystaff.value": {
                    "origin": 0,
                    "scale": 1
                  }
                }
              }
            ],
            "query": {
              "match": {
                "term": {
                  "mystaff.name": "bob^0.89"
                }
              }
            }
          }
        }
      ]
    }
  }
}

(Josh Harrison) #4

I was able to do this with something very similar to the query above - I ended up needing two nested queries, containing a bool query, with a "should" clause containing a bunch of gauss function scores pointing to the individual weight in each nested record, with an origin of 1 and a scale of 1 (so the further away from 1 the value was, the lower it was weighted, if I understand correctly). Each function_score has a "boost" parameter corresponding to the value of the searched weight.

Essentially this, given my previous example records:

{
  "query": {
    "nested": {
      "score_mode": "sum",
      "path": "mystaff",
      "query": {
        "bool": {
          "should": [
            {
              "function_score": {
                "query": {
                  "match": {
                    "mystaff.name": "bob"
                  }
                },
                "functions": [
                  {
                    "gauss": {
                      "mystaff.value": {
                        "origin": 1,
                        "scale": 1
                      }
                    }
                  }
                ],
                "boost": 0.89,
                "score_mode": "multiply"
              }
            },
            {
              "function_score": {
                "query": {
                  "match": {
                    "mystaff.name": "fred"
                  }
                },
                "functions": [
                  {
                    "gauss": {
                      "mystaff.value": {
                        "origin": 1,
                        "scale": 1
                      }
                    }
                  }
                ],
                "boost": 0.67,
                "score_mode": "multiply"
              }
            }
          ]
        }
      }
    }
  }
}

(system) #5