Configure highlighted part

Pavel_Maltsev · January 23, 2022, 9:11pm

Main question
The user is looking for a name and enters the part of it, let's say au, and the document with the text paul is found.
I would like to have the doc highlighted like paul.
How can I achieve it if I have a complex search query (combination of match, prefix, wildcard to rule relevance)?

Sub question
When do highlight settings from documentation for type, boundary_scanner and boundary_chars come into play? As per my tests described below, these settings don't change highlighted part.

Try 1: Wildcard query with default analyzer

PUT myindex
{
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "term_vector": "with_positions_offsets"
            }
        }
    }
}
POST myindex/_doc/1
{
    "name": "paul"
}
GET myindex/_search
{
    "query": {
        "wildcard": {"name": "*au*"}
    },
    "highlight": {
        "fields": { 
            "name": {}
        },
        "type": "fvh",
        "boundary_scanner": "chars",
        "boundary_chars": "abcdefghijklmnopqrstuvwxyz.,!? \t\n"
    }
}

This kind of search returns highlight paul but I need to get paul.

Try 2: Match query with NGRAM analyzer
This one works as described in SO question: autocomplete - Highlighting part of word in elasticsearch - Stack Overflow

PUT myindexngram
{
    "settings": {
        "analysis": {
            "tokenizer": {
                "ngram_tokenizer": {
                    "type": "nGram",
                    "min_gram": "2",
                    "max_gram": "3",
                    "token_chars": [
                        "letter",
                        "digit"
                    ]
                }
            },
            "analyzer": {
                "index_ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "ngram_tokenizer",
                    "filter": [
                        "lowercase"
                    ]
                },
                "search_term_analyzer": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": "lowercase"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "analyzer": "index_ngram_analyzer",
                "term_vector": "with_positions_offsets"
            }
        }
    }
}
POST myindexngram/_doc/1
{
    "name": "paul"
}
GET myindexngram/_search
{
    "query": {
        "match": {"name": "au"}
    },
    "highlight": {
        "fields": { 
            "name": {}
        }
    }
}

This highlights paul as desired but:

Highlighting depends on the query type, so combining match and wildcard into bool.should will again result in paul.
Highlighting is not affected at all on type, boundary_scanner and boundary_chars settings.

Elastic version 7.13.4

mayya · January 27, 2022, 10:14pm

A highlighter works on terms, so only full terms can be highlighted - whatever are the terms in your index. In your second example, au could be highlighted, because it it a term in the index, which is not the case for your first example.
There is also an option to define your own highlight_query that could be different from the main query, but this could lead to unpredictable highlights.

system · February 24, 2022, 10:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Highlighting part of a term Elasticsearch	2	587	July 5, 2017
Highlight part of a term Elasticsearch	2	306	July 6, 2017
Search as you type and highlighter Elasticsearch	2	1166	December 2, 2021
Fwd: Highlight part of words Elasticsearch	1	609	July 6, 2017
Highlight works not always! Elasticsearch	1	328	July 6, 2017

Configure highlighted part

Related topics