Configure highlighted part

Main question
The user is looking for a name and enters the part of it, let's say au, and the document with the text paul is found.
I would like to have the doc highlighted like p<em>au</em>l.
How can I achieve it if I have a complex search query (combination of match, prefix, wildcard to rule relevance)?

Sub question
When do highlight settings from documentation for type, boundary_scanner and boundary_chars come into play? As per my tests described below, these settings don't change highlighted part.

Try 1: Wildcard query with default analyzer

PUT myindex
{
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "term_vector": "with_positions_offsets"
            }
        }
    }
}
POST myindex/_doc/1
{
    "name": "paul"
}
GET myindex/_search
{
    "query": {
        "wildcard": {"name": "*au*"}
    },
    "highlight": {
        "fields": { 
            "name": {}
        },
        "type": "fvh",
        "boundary_scanner": "chars",
        "boundary_chars": "abcdefghijklmnopqrstuvwxyz.,!? \t\n"
    }
}

This kind of search returns highlight <em>paul</em> but I need to get p<em>au</em>l.

Try 2: Match query with NGRAM analyzer
This one works as described in SO question: autocomplete - Highlighting part of word in elasticsearch - Stack Overflow

PUT myindexngram
{
    "settings": {
        "analysis": {
            "tokenizer": {
                "ngram_tokenizer": {
                    "type": "nGram",
                    "min_gram": "2",
                    "max_gram": "3",
                    "token_chars": [
                        "letter",
                        "digit"
                    ]
                }
            },
            "analyzer": {
                "index_ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "ngram_tokenizer",
                    "filter": [
                        "lowercase"
                    ]
                },
                "search_term_analyzer": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": "lowercase"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "analyzer": "index_ngram_analyzer",
                "term_vector": "with_positions_offsets"
            }
        }
    }
}
POST myindexngram/_doc/1
{
    "name": "paul"
}
GET myindexngram/_search
{
    "query": {
        "match": {"name": "au"}
    },
    "highlight": {
        "fields": { 
            "name": {}
        }
    }
}

This highlights p<em>au</em>l as desired but:

  1. Highlighting depends on the query type, so combining match and wildcard into bool.should will again result in <em>paul</em>.
  2. Highlighting is not affected at all on type, boundary_scanner and boundary_chars settings.

Elastic version 7.13.4

A highlighter works on terms, so only full terms can be highlighted - whatever are the terms in your index. In your second example, au could be highlighted, because it it a term in the index, which is not the case for your first example.
There is also an option to define your own highlight_query that could be different from the main query, but this could lead to unpredictable highlights.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.