Highlighting in a a search query

dahlsm · May 31, 2024, 1:43pm

Hello, I'm quite new to Elasticsearch and am currently testing out its features. I'm sending the following query:

var body =
$$"""
 {
   "query": {
     "query_string": {
        "query": "{{searchText}}"
     }
   },
    "highlight": {
      "require_field_match": false,
      "fields": {
        "*": {                                             
          "fragment_size": 10,
            "number_of_fragments": 1
        }
      }
    }
 }
 """;

I've also tried the following variant with the same result:

var body =
$$"""
 {
   "query": {
     "query_string": {
        "query": "{{searchText}}"
     }
   },
    "highlight": {
      "require_field_match": false,
      "fields": {
        "content": {}
      }
    }
 }
 """;

And here's the index mapping:

Index mapping

{
  "mappings": {
    "dynamic": "true",
    "dynamic_templates": [
      {
        "all_text_fields": {
          "match_mapping_type": "string",
          "mapping": {
            "analyzer": "iq_text_base",
            "fields": {
              "delimiter": {
                "analyzer": "iq_text_delimiter",
                "type": "text",
                "index_options": "freqs"
              },
              "joined": {
                "search_analyzer": "q_text_bigram",
                "analyzer": "i_text_bigram",
                "type": "text",
                "index_options": "freqs"
              },
              "prefix": {
                "search_analyzer": "q_prefix",
                "analyzer": "i_prefix",
                "type": "text",
                "index_options": "docs"
              },
              "enum": {
                "ignore_above": 2048,
                "type": "keyword"
              },
              "stem": {
                "analyzer": "iq_text_stem",
                "type": "text"
              }
            }
          }
        }
      }
    ],
    "properties": {
      "content": {
        "type": "text",
        "fields": {
          "delimiter": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "iq_text_delimiter"
          },
          "enum": {
            "type": "keyword",
            "ignore_above": 2048
          },
          "joined": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "i_text_bigram",
            "search_analyzer": "q_text_bigram"
          },
          "prefix": {
            "type": "text",
            "index_options": "docs",
            "analyzer": "i_prefix",
            "search_analyzer": "q_prefix"
          },
          "stem": {
            "type": "text",
            "analyzer": "iq_text_stem"
          }
        },
        "analyzer": "iq_text_base"
      },
      "content_embedding": {
        "type": "sparse_vector"
      },
      "id": {
        "type": "text",
        "fields": {
          "delimiter": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "iq_text_delimiter"
          },
          "enum": {
            "type": "keyword",
            "ignore_above": 2048
          },
          "joined": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "i_text_bigram",
            "search_analyzer": "q_text_bigram"
          },
          "prefix": {
            "type": "text",
            "index_options": "docs",
            "analyzer": "i_prefix",
            "search_analyzer": "q_prefix"
          },
          "stem": {
            "type": "text",
            "analyzer": "iq_text_stem"
          }
        },
        "analyzer": "iq_text_base"
      },
      "is_truncated": {
        "type": "boolean"
      },
      "model_id": {
        "type": "text",
        "fields": {
          "delimiter": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "iq_text_delimiter"
          },
          "enum": {
            "type": "keyword",
            "ignore_above": 2048
          },
          "joined": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "i_text_bigram",
            "search_analyzer": "q_text_bigram"
          },
          "prefix": {
            "type": "text",
            "index_options": "docs",
            "analyzer": "i_prefix",
            "search_analyzer": "q_prefix"
          },
          "stem": {
            "type": "text",
            "analyzer": "iq_text_stem"
          }
        },
        "analyzer": "iq_text_base"
      }
    }
  }
}

My issue: I'm getting the expected results back, but without any highlighting. Any suggestions as to what I'm missing?

joemcelroy · June 7, 2024, 9:46am

Hello,

Sorry for late response.

This should work as you expect and was able to test this locally. the highlight does come back in a seperate object within the hit:

{
        "_index": "example",
        "_id": "HjPKgo8BHA_1IaiKhJSl",
        "_score": 0.2876821,
        "_source": {
          "content": "hello",
        },
        "highlight": {
          "content": [
            "<em>hello</em>"
          ]
        }
      }

Joe

dahlsm · June 7, 2024, 10:08am

Hi, thanks for testing. An important detail is that I was doing this via the .NET search client (Elastic.Transport). Could that affect the results?

dahlsm · June 7, 2024, 10:21am

Hello and thanks again - your response made me realize something: I didn't see highlights because I was testing semantic search with ELSER, i.e. I was typing terms that don't exist in the text but that do have semantic matches. Typing in terms that directly existed in the indexed text, I was able to get highlighting of those terms as expected!

But as a follow up, do you know if there is any way to highlight semantic hits? As an example, if I search for "toxic" I get the document I'm looking for in return (I assume because it contains contextually similar terms like "contaminate", "discharged water" etc), but no highlighting because the word "toxic" is not actually in the document. Is it possible to highlight whatever the semantic search has indicated as contextually similar to what I searched for?

Best regards,
Silje

joemcelroy · June 7, 2024, 9:24pm

Something we have discussed internally but we don't unfortunately support it and not in near future. Not the easiest thing to do.

At a stretch, you could do this at the client side, tokenizing the text and finding words that are most similar to the query. Another could be chunking text into small passages (sentence or two) and embedding them. On query, surface the most similar passages for the query.

Joe

dahlsm · June 10, 2024, 7:19am

Thank you for the very helpful information! I'll look into your suggestions on embedding as well

Best regards,
Silje

system · July 8, 2024, 7:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question about highlight query Elasticsearch	5	407	July 6, 2017
Elasticsearch \| Highlight of integer fields when wildcard is used Elasticsearch	6	1601	April 22, 2019
Query string and highlighting not working as expected Elasticsearch	6	4180	July 5, 2017
ElasticSearch Highlighting not doing what I would expect Elasticsearch	3	28	August 14, 2024
Highlight filter queries Elasticsearch	2	666	July 5, 2017

Highlighting in a a search query

Related topics