I have a nested document structure like {"mystaff":[{"name":"bob", "value":0.375}, {"name":"fred", "value":0.8537}]}
Is it possible to use the value associated with bob or fred as a weight value in a query when determining the "best" document? That is, if I have both {"_index":"staff", "_type":"performance", "_id":1,"_source":{"mystaff":[{"name":"bob", "value":0.375}, {"name":"fred", "value":0.8537}]}} {"_index":"staff", "_type":"performance", "_id":2,"_source":{"mystaff":[{"name":"bob", "value":0.99743}, {"name":"fred", "value":0.5583}]}}
I'd like to be able to search for "bob" and get back /staff/performance/2 as a higher scoring record than /staff/performance/1, based on the value Bob has at mystaff.value.
Note this is an extremely simplified example - I ultimately want to build something akin to a list of performance for events for many hundreds of "staff" per "performance event", then be able to issue a seperate weighted query where I care more about how Kerry and Sue did than Bob and Fred, but I don't care how Jerry did or if he even participated. I know that'll end up being a large bool query with "should" and "must" clauses, along with weighting of individual terms, but this seems like the core approach to something like that.
That is to say, I want to query the events where overall Kerry and Sue did really well, Bob and Fred doing well is a bonus but not hugely impactful, and Jerry may or may not have participated in the event.
I think I'd need something beyond just boosted - as far as I can tell, that's saying "when I am doing my query, I care more about tokens X and Y than I do Z".
The first part of what I need to do is, if I query "fred" without any specific boosting, get back values in order of document id 1, then document id 2 (because "fred" has a value of 0.8537 in doc 1, and 0.5583 in doc 2). Similarly, I want to be able to search for "bob" without any specific boosting to get back document 2, then document 1 (doc 1, bob has a value of 0.375, doc 2 bob has a value of 0.99743).
Then on top of that, I want to apply query side boosting - I want to put together a query like (I know not doing a nested query right, at the moment this is just an example) {"query":{"bool":{"should":[{"term":{"mystaff.name":"bob^0.89"}},{"term":{"mystaff.name":"fred^0.67"}}]}}}
Would this end up looking like using a bunch of decay functions in boolean statements like:
I was able to do this with something very similar to the query above - I ended up needing two nested queries, containing a bool query, with a "should" clause containing a bunch of gauss function scores pointing to the individual weight in each nested record, with an origin of 1 and a scale of 1 (so the further away from 1 the value was, the lower it was weighted, if I understand correctly). Each function_score has a "boost" parameter corresponding to the value of the searched weight.
Essentially this, given my previous example records:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.