Given the following index:
{
"mappings": {
"properties": {
"field1": {
"type": "text",
"analyzer": "whitespace",
"similarity": "boolean"
},
"field2": {
"type": "text",
"analyzer": "whitespace",
"similarity": "boolean"
}
}
}
}
And the following data in it:
{ "index" : {} }
{ "field1": "foo", "field2": "bar"}
{ "index" : {} }
{ "field1": "foo1 foo2", "field2": "bar1 bar2"}
{ "index" : {} }
{ "field1": "foo1 foo2 foo3", "field2": "bar1 bar2 bar3"}
For the given Boolean query:
{
"size": 10,
"min_score": 0.4,
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"fuzzy":{
"field1":{
"value":"foo",
"fuzziness":"AUTO",
"boost": 1
}
}
},
{
"fuzzy":{
"field2":{
"value":"bar",
"fuzziness":"AUTO",
"boost": 1
}
}
}
]
}
}
}
}
}
I'm always receiving ["foo1 foo2 foo3", "bar1 bar2 bar3"] despite the fact that there is an exact result in index (the first one):
{
"took": 114,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 3.9999998,
"hits": [
{
"_index": "test_index2",
"_type": "_doc",
"_id": "bXw8eXUBCTtfNv84bNPr",
"_score": 3.9999998,
"_source": {
"field1": "foo1 foo2 foo3",
"field2": "bar1 bar2 bar3"
}
},
{
"_index": "test_index2",
"_type": "_doc",
"_id": "bHw8eXUBCTtfNv84bNPr",
"_score": 2.6666665,
"_source": {
"field1": "foo1 foo2",
"field2": "bar1 bar2"
}
},
{
"_index": "test_index2",
"_type": "_doc",
"_id": "a3w8eXUBCTtfNv84bNPr",
"_score": 2.0,
"_source": {
"field1": "foo",
"field2": "bar"
}
}
]
}
}
I'm aware of the fact that Boolean works that way to match as many results, and I know I can do rescoring here, but this is not an option since I don't know how many top N results to fetch.
Are there any other options here? Maybe to create my own similarity plugin based on Boolean similarity to remove duplicates and leave the best matched token, but I don't know where to start from, I see only samples for script and rescore.
Thanks in advance.