Highlighting issue with fuzzy query with edge_ngram tokens

lschnei · July 2, 2024, 7:08am

I couldn’t find a relevant example, so here’s my issue:

I’m developing a search-as-you-type feature using edge_ngram tokens, which allows for spelling mistakes (fuzzy queries) while highlighting the current edge_ngram token being matched.

The problem arises with exact matches: the highlight appears one character ahead due to the fuzzy component. For example, with a field value of “37751” and a query of “3775”, the highlight shows “37751” because of the fuzziness. The fuzzy query seems to prioritize the resulting longer token, causing it to be highlighted.

I’ve tried adjusting the highlights parameter, specifically setting the fragment_size to the length of the query, but the issue persists.

Can you help me resolve this?

For reproducibility, here are my mappings, settings, and query. Thanks in advance.

Settings

   {"index": {"max_ngram_diff": 10},
    "analysis": {

        "analyzer": {
            "autocomplete": {
                "tokenizer": "autocomplete",
                "filter": ["lowercase", "asciifolding"],
            },
            "autocomplete_search": {
                "tokenizer": "standard",
                "filter": ["lowercase", "asciifolding"],
        },
        "tokenizer": {
            "autocomplete": {
                "type": "edge_ngram",
                "min_gram": 2,
                "max_gram": 10,
                "token_chars": ["letter", "digit"],
            },
        },
    }

Mappings

{
    "properties": {
        "A_ID": {"type": "text", "copy_to": "autocomplete_text"},
        "B_ID": {
            "type": "text",
            "copy_to": "autocomplete_text",
        },
        "name": {
            "type": "text",
            "copy_to": "autocomplete_text",
        },
        "autocomplete_text": {
            "type": "text",
            "analyzer": "autocomplete",
            "search_analyzer": "autocomplete_search",
        },
    }
}

Query

query= "3775"
 
bool = {
    "bool": {
        "should": [
            {
                "match": {
                    "autocomplete_text": {
                        "query": query,
                        "operator": "and",
                        "boost": 10,
                    }
                }
            },
            {
                "match": {
                    "autocomplete_text": {
                        "query": query,
                        "operator": "and",
                        "fuzziness": "AUTO",
                    }
                },
            },
        ],
    }
}
 
highlight = {
    "fields": [
        {
            "autocomplete_text": {
                "fragment_size": len(query),
            }
        }
    ],
}

Topic		Replies	Views
Highlight problem on fuzzy search Elasticsearch	2	944	February 12, 2020
Edge Ngram gives bad highlight when using position offsets Elasticsearch	4	2259	July 6, 2017
Fuzzy Query With Wildcard Elasticsearch	3	2901	July 5, 2017
EdgeNGrams with Fuzzy not working perfect Elasticsearch	9	1337	April 27, 2017
How to make shorter (closer) token match more relevant? (edge_ngram) Elasticsearch	1	289	October 13, 2020

Highlighting issue with fuzzy query with edge_ngram tokens

Related topics