Implicit versus explicit definition of the fast vector highlighter

sbruinsje · August 12, 2022, 8:40am

Hi there,

I was under the impression that if the mappings of a field set the term_vector to term_vector: 'with_positions_offsets', that for that field the fast vector highlighter will be used automatically. I thought so because of this comment from the docs: The fast vector highlighter will be used by default for the text field because term vectors are enabled.. The comment can be found at the bottom of this page: term_vector | Elasticsearch Guide [8.3] | Elastic. Do I understand this correctly?

It doesn't seem to be the case though, because when I explicitly set the fvh in the query then the query takes much longer (about 5-10x increase sometimes). If I leave out the explicit type of highlighter in the query then its fast.

"fields": {
    "versions.title": {
        "type": 'fvh', <------- Including this line makes the query much slower
        "number_of_fragments": 0
    },
    "versions.body": {
        "type": "fvh", <------- Including this line makes the query much slower
        "number_of_fragments": 0
    }
}

My question are:

Given my mappings and query below, what highlighter is supposed to be used for the title and body?
If by default the FVH highlighter is supposed to be used, then why would explicitly setting the FVH suddenly make the query much longer?
Is there some way to see which highlighter is being applied?

The mappings:

GET article_set_index/_mappings
{
  "article_set_index": {
    "mappings": {
      "dynamic": "false",
      "properties": {
        "authorIds": {
          "type": "keyword"
        },
        "language": {
          "type": "keyword"
        },
        "uniquePublicationDates": {
          "type": "date"
        },
        "uniqueSerialIds": {
          "type": "keyword"
        },
        "versions": {
          "type": "nested",
          "properties": {
            "authorIds": {
              "type": "keyword"
            },
            "body": {
              "type": "text",
              "term_vector": "with_positions_offsets",
              "analyzer": "folding"
            },
            "created": {
              "type": "date"
            },
            "language": {
              "type": "keyword"
            },
            "publicationDate": {
              "type": "date"
            },
            "published": {
              "type": "date"
            },
            "serialId": {
              "type": "keyword"
            },
            "title": {
              "type": "text",
              "term_vector": "with_positions_offsets",
              "analyzer": "folding"
            }
          }
        }
      }
    }
  }
}

Query without explicitly setting the fvh:

POST article_set_index/_search
{
  "size": 0,
  "query": {
    "nested": {
      "path": "versions",
      "score_mode": "max",
      "inner_hits": {
        "size": 100,
        "highlight": {
          "pre_tags": [
            "<mark>"
          ],
          "post_tags": [
            "</mark>"
          ],
          "fields": {
            "versions.title": {
              "number_of_fragments": 0
            },
            "versions.body": {
              "number_of_fragments": 0
            }
          }
        },
        "sort": {
          "versions.published": {
            "order": "asc"
          }
        }
      },
      "query": {
        "bool": {
          "must": [
            {
              "dis_max": {
                "queries": [
                  {
                    "simple_query_string": {
                      "query": """((startup) | ("start-up") | (tech* onderneming*) -("sprout update")""",
                      "default_operator": "and",
                      "fields": [
                        "versions.title",
                        "versions.body"
                      ]
                    }
                  },
                  {
                    "bool": {
                      "must": [
                        {
                          "simple_query_string": {
                            "query": """(technol*) | (techiek) | (innovat*) | ((startup) | ("start-up") | (tech*) | (artificial) | (kunstmatig*)""",
                            "default_operator": "and",
                            "fields": [
                              "versions.title",
                              "versions.body"
                            ]
                          }
                        }
                      ],
                      "filter": [
                        {
                          "terms": {
                            "versions.authorIds": [
                              "erwin-boogert-9395"
                            ]
                          }
                        }
                      ]
                    }
                  }
                ],
                "tie_breaker": 0.7
              }
            }
          ],
          "filter": [
            {
              "terms": {
                "versions.language": [
                  "nl"
                ]
              }
            },
            {
              "range": {
                "versions.publicationDate": {
                  "gte": "2022-02-11",
                  "lte": "2022-08-11"
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "totalAuthorCount": {
      "cardinality": {
        "field": "authorIds",
        "precision_threshold": 100
      }
    },
    "authors": {
      "terms": {
        "field": "authorIds",
        "size": 121,
        "shard_size": 400
      },
      "aggs": {
        "articles": {
          "top_hits": {
            "size": 5
          }
        }
      }
    }
  },
  "timeout": "180000ms"
}

system · September 9, 2022, 8:40am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
FVH Highlighter Not Being Used By Default Elasticsearch	1	296	December 15, 2020
Fast vector highlighter (fvh) making searches slower Elasticsearch	7	1214	December 29, 2021
Elasticsearch- Fast Vector Highlightling Elasticsearch	5	424	October 23, 2018
Highlighting with fvh and fuzziness taking long time in Elasticsearch Elasticsearch	3	769	November 15, 2018
Fast vector highlighter does not work with explicit span_near queries Elasticsearch	6	1186	July 6, 2017

Implicit versus explicit definition of the fast vector highlighter

Related topics