Minimum Missed Conditions Search

Hi guys, I just started learning ES and it is fun to play with the data and try to apply the queries to some special cases.

These days I am trying to build two sorting methods to help return better results:
Max Used Conditions
Min Missing Conditions

I can simply achieve maximum used conditions by

GET test/_search
{
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "should": [
        {"match": {"tags": {"query": "condition A","fuzziness": "AUTO"}}},
        {"match": {"tags": {"query": "condition B","fuzziness": "AUTO"}}},
        {"match": {"tags": {"query": "condition C","fuzziness": "AUTO"}}},       
...

tags is an Array contain different conditions.
The more tags element matched in a doc, the higher score it will have.

However, when we tried to sort the result with minimum missed conditions:
For example,
I have the following items with tags:
ItemA: [condition A, condition B, condition C]
ItemB: [condition A, condition B, condition D, condition F]
ItemC: [condition A, condition B, condition D, condition E, condition F]

I use [condition A, condition B, condition C, condition E]
ItemA's tags all got matched (3/3) 100%
ItemB's tags (2/4) 50%
ItemC's tags (3/5) 60%
I hope the sorting order to be: ItemA > ItemC > ItemB

I wrote a function script myself, but it ended up with 3 nested for loops and I know it is not practical and it doesn't use ES's advanced searching features... Is there any way that I can return get used conditions and missed conditions Count to help to achieve what I want? Thank you.

POST _scripts/min-missing-conditions
{
	"script":{
     "lang": "painless",
     "source": """
         def cnt = 0.0;
         //I am trying to achieve kind of fuzzy search in the script...
         for (int i = 0; i < doc['tags.keyword'].length; ++i){
            for (int j = 0; j < params.tags.length; ++j){
                String[] tagsSplit = params.tags[j].splitOnToken(" ");
                for (int k = 0; k < tagsSplit.length; ++k){
                   if (doc['tags.keyword'][i].contains(tagsSplit[k])){
                       cnt = cnt + 1.0/(2.0 * tagsSplit.length);
                   }
                   if (tagsSplit[k].contains(doc['tags.keyword'][i])){
                       cnt = cnt + 1.0/(2.0 * tagsSplit.length);
                   }
              }
         }
   }
   return cnt / doc['tags.keyword'].length;
   """
   }
}

I can kind of getting the number of matched conditions by using function_score with functions:

GET test/_search
{
  "query": {
  "function_score":{
    "query": {...},
    "functions": [{
       {
          "filter":{"match": {"tags": {"query": "condition A","fuzziness": "AUTO"}}},
          "weight": 1
        }
...
    ],
    "score_mode": "sum",
    "boost_mode": "replace"

and get the tags array length via script:

"source": "params['_source'].tags.length

However, I cannot use them together since for function_score query, "[you can either define [functions] array or a single function, not both. already found [functions] array, now encountering [script_score].]"

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.