Hi guys, I just started learning ES and it is fun to play with the data and try to apply the queries to some special cases.
These days I am trying to build two sorting methods to help return better results:
Max Used Conditions
Min Missing Conditions
I can simply achieve maximum used conditions by
GET test/_search
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{"match": {"tags": {"query": "condition A","fuzziness": "AUTO"}}},
{"match": {"tags": {"query": "condition B","fuzziness": "AUTO"}}},
{"match": {"tags": {"query": "condition C","fuzziness": "AUTO"}}},
...
tags is an Array contain different conditions.
The more tags element matched in a doc, the higher score it will have.
However, when we tried to sort the result with minimum missed conditions:
For example,
I have the following items with tags:
ItemA: [condition A, condition B, condition C]
ItemB: [condition A, condition B, condition D, condition F]
ItemC: [condition A, condition B, condition D, condition E, condition F]
I use [condition A, condition B, condition C, condition E]
ItemA's tags all got matched (3/3) 100%
ItemB's tags (2/4) 50%
ItemC's tags (3/5) 60%
I hope the sorting order to be: ItemA > ItemC > ItemB
I wrote a function script myself, but it ended up with 3 nested for loops and I know it is not practical and it doesn't use ES's advanced searching features... Is there any way that I can return get used conditions and missed conditions Count to help to achieve what I want? Thank you.
POST _scripts/min-missing-conditions
{
"script":{
"lang": "painless",
"source": """
def cnt = 0.0;
//I am trying to achieve kind of fuzzy search in the script...
for (int i = 0; i < doc['tags.keyword'].length; ++i){
for (int j = 0; j < params.tags.length; ++j){
String[] tagsSplit = params.tags[j].splitOnToken(" ");
for (int k = 0; k < tagsSplit.length; ++k){
if (doc['tags.keyword'][i].contains(tagsSplit[k])){
cnt = cnt + 1.0/(2.0 * tagsSplit.length);
}
if (tagsSplit[k].contains(doc['tags.keyword'][i])){
cnt = cnt + 1.0/(2.0 * tagsSplit.length);
}
}
}
}
return cnt / doc['tags.keyword'].length;
"""
}
}