After a bit of tries I came up with something even if I'm not completely satisfied of results.
This is my mapping:
{
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"name_suggest": {
"type": "completion",
"contexts": [
{
"name": "country_context",
"type": "category",
"path": "country.keyword"
}
]
},
"gender": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
An example of data is this:
I want to show suggestions to the user during typing using also fuzzy to allow some typo.
I came up with this query that uses suggestions AND exact match because I want to give a better score to exact match rather than suggestions.
{
"size": 15,
"query": {
"bool": {
"should": [
{
"match": {
"name.keyword": {
"query": "Marc",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"_source": {
"includes": [
"name"
],
"excludes": []
},
"suggest": {
"text": "Marc",
"complete": {
"text": "Marc",
"prefix": "Marc",
"completion": {
"field": "name_suggest",
"size": 10,
"fuzzy": {
"fuzziness": 1,
"transpositions": true,
"min_length": 3,
"prefix_length": 1,
"unicode_aware": false,
"max_determinized_states": 10000
},
"contexts": {
"country_context": [
{
"context": "IT",
"boost": 1,
"prefix": false
}
]
}
}
}
}
}
and these are results:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"suggest": {
"complete": [
{
"text": "Marc",
"offset": 0,
"length": 4,
"options": [
{
"text": "Mara",
"_index": "personname",
"_type": "_doc",
"_id": "2046830049",
"_score": 4,
"_routing": "global",
"_source": {
"name": "Mara"
},
"contexts": {
"country_context": [
"IT"
]
}
},
{
"text": "Marat",
"_index": "personname",
"_type": "_doc",
"_id": "-1718195506",
"_score": 4,
"_routing": "global",
"_source": {
"name": "Marat"
},
"contexts": {
"country_context": [
"IT"
]
}
},
{
"text": "Marca",
"_index": "personname",
"_type": "_doc",
"_id": "-2041994534",
"_score": 4,
"_routing": "global",
"_source": {
"name": "Marca"
},
"contexts": {
"country_context": [
"IT"
]
}
},
{
"text": "Marcantonio",
"_index": "personname",
"_type": "_doc",
"_id": "-16856444",
"_score": 4,
"_routing": "global",
"_source": {
"name": "Marcantonio"
},
"contexts": {
"country_context": [
"IT"
]
}
},
{
"text": "Marcella",
"_index": "personname",
"_type": "_doc",
"_id": "48281663",
"_score": 4,
"_routing": "global",
"_source": {
"name": "Marcella"
},
"contexts": {
"country_context": [
"IT"
]
}
},
{
"text": "Marcelliano",
"_index": "personname",
"_type": "_doc",
"_id": "-836086954",
"_score": 4,
"_routing": "global",
"_source": {
"name": "Marcelliano"
},
"contexts": {
"country_context": [
"IT"
]
}
},
{
"text": "Marcellina",
"_index": "personname",
"_type": "_doc",
"_id": "-695286534",
"_score": 4,
"_routing": "global",
"_source": {
"name": "Marcellina"
},
"contexts": {
"country_context": [
"IT"
]
}
},
{
"text": "Marcellino",
"_index": "personname",
"_type": "_doc",
"_id": "-371432729",
"_score": 4,
"_routing": "global",
"_source": {
"name": "Marcellino"
},
"contexts": {
"country_context": [
"IT"
]
}
},
{
"text": "Marcello",
"_index": "personname",
"_type": "_doc",
"_id": "372135468",
"_score": 4,
"_routing": "global",
"_source": {
"name": "Marcello"
},
"contexts": {
"country_context": [
"IT"
]
}
},
{
"text": "Marchetto",
"_index": "personname",
"_type": "_doc",
"_id": "-1073596950",
"_score": 4,
"_routing": "global",
"_source": {
"name": "Marchetto"
},
"contexts": {
"country_context": [
"IT"
]
}
}
]
}
]
}
}
What I expected:
I expected to have in results at least "Marco" (italian name).
What I got:
I got these suggestions from ES that are quite far from what the user wants:
[
"Mara",
"Marat",
"Marca",
"Marcantonio",
"Marcella",
"Marcelliano",
"Marcellina",
"Marcellino",
"Marcello",
"Marchetto"
]
I don't get why "Marco" that is closer to the search string "Marc" is not selected.
A small detail: If I increase the size of results from 10 to 20 I got this response that contains "Marco":
[
"Mara",
"Marat",
"Marca",
"Marcantonio",
"Marcella",
"Marcelliano",
"Marcellina",
"Marcellino",
"Marcello",
"Marchetto,
"Marchina",
"Marchino",
"Marchisio",
"Marciano",
"Marciliano",
"Marcilio",
"Marco",
"Marcolina",
"Marcolino",
"Marcuccia"
]
but the sorting of results are not good enough and I don't understand how I can improve that results.
I hope in some hint. Thanks