I couldn’t find a relevant example, so here’s my issue:
I’m developing a search-as-you-type feature using edge_ngram tokens, which allows for spelling mistakes (fuzzy queries) while highlighting the current edge_ngram token being matched.
The problem arises with exact matches: the highlight appears one character ahead due to the fuzzy component. For example, with a field value of “37751” and a query of “3775”, the highlight shows “37751” because of the fuzziness. The fuzzy query seems to prioritize the resulting longer token, causing it to be highlighted.
I’ve tried adjusting the highlights parameter, specifically setting the fragment_size to the length of the query, but the issue persists.
Can you help me resolve this?
For reproducibility, here are my mappings, settings, and query. Thanks in advance.
Settings
{"index": {"max_ngram_diff": 10},
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete",
"filter": ["lowercase", "asciifolding"],
},
"autocomplete_search": {
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding"],
},
"tokenizer": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": ["letter", "digit"],
},
},
}
Mappings
{
"properties": {
"A_ID": {"type": "text", "copy_to": "autocomplete_text"},
"B_ID": {
"type": "text",
"copy_to": "autocomplete_text",
},
"name": {
"type": "text",
"copy_to": "autocomplete_text",
},
"autocomplete_text": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "autocomplete_search",
},
}
}
Query
query= "3775"
bool = {
"bool": {
"should": [
{
"match": {
"autocomplete_text": {
"query": query,
"operator": "and",
"boost": 10,
}
}
},
{
"match": {
"autocomplete_text": {
"query": query,
"operator": "and",
"fuzziness": "AUTO",
}
},
},
],
}
}
highlight = {
"fields": [
{
"autocomplete_text": {
"fragment_size": len(query),
}
}
],
}