Highlighting in a a search query

Hello, I'm quite new to Elasticsearch and am currently testing out its features. I'm sending the following query:

var body =
$$"""
 {
   "query": {
     "query_string": {
        "query": "{{searchText}}"
     }
   },
    "highlight": {
      "require_field_match": false,
      "fields": {
        "*": {                                             
          "fragment_size": 10,
            "number_of_fragments": 1
        }
      }
    }
 }
 """;

I've also tried the following variant with the same result:

var body =
$$"""
 {
   "query": {
     "query_string": {
        "query": "{{searchText}}"
     }
   },
    "highlight": {
      "require_field_match": false,
      "fields": {
        "content": {}
      }
    }
 }
 """;

And here's the index mapping:

Index mapping
{
  "mappings": {
    "dynamic": "true",
    "dynamic_templates": [
      {
        "all_text_fields": {
          "match_mapping_type": "string",
          "mapping": {
            "analyzer": "iq_text_base",
            "fields": {
              "delimiter": {
                "analyzer": "iq_text_delimiter",
                "type": "text",
                "index_options": "freqs"
              },
              "joined": {
                "search_analyzer": "q_text_bigram",
                "analyzer": "i_text_bigram",
                "type": "text",
                "index_options": "freqs"
              },
              "prefix": {
                "search_analyzer": "q_prefix",
                "analyzer": "i_prefix",
                "type": "text",
                "index_options": "docs"
              },
              "enum": {
                "ignore_above": 2048,
                "type": "keyword"
              },
              "stem": {
                "analyzer": "iq_text_stem",
                "type": "text"
              }
            }
          }
        }
      }
    ],
    "properties": {
      "content": {
        "type": "text",
        "fields": {
          "delimiter": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "iq_text_delimiter"
          },
          "enum": {
            "type": "keyword",
            "ignore_above": 2048
          },
          "joined": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "i_text_bigram",
            "search_analyzer": "q_text_bigram"
          },
          "prefix": {
            "type": "text",
            "index_options": "docs",
            "analyzer": "i_prefix",
            "search_analyzer": "q_prefix"
          },
          "stem": {
            "type": "text",
            "analyzer": "iq_text_stem"
          }
        },
        "analyzer": "iq_text_base"
      },
      "content_embedding": {
        "type": "sparse_vector"
      },
      "id": {
        "type": "text",
        "fields": {
          "delimiter": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "iq_text_delimiter"
          },
          "enum": {
            "type": "keyword",
            "ignore_above": 2048
          },
          "joined": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "i_text_bigram",
            "search_analyzer": "q_text_bigram"
          },
          "prefix": {
            "type": "text",
            "index_options": "docs",
            "analyzer": "i_prefix",
            "search_analyzer": "q_prefix"
          },
          "stem": {
            "type": "text",
            "analyzer": "iq_text_stem"
          }
        },
        "analyzer": "iq_text_base"
      },
      "is_truncated": {
        "type": "boolean"
      },
      "model_id": {
        "type": "text",
        "fields": {
          "delimiter": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "iq_text_delimiter"
          },
          "enum": {
            "type": "keyword",
            "ignore_above": 2048
          },
          "joined": {
            "type": "text",
            "index_options": "freqs",
            "analyzer": "i_text_bigram",
            "search_analyzer": "q_text_bigram"
          },
          "prefix": {
            "type": "text",
            "index_options": "docs",
            "analyzer": "i_prefix",
            "search_analyzer": "q_prefix"
          },
          "stem": {
            "type": "text",
            "analyzer": "iq_text_stem"
          }
        },
        "analyzer": "iq_text_base"
      }
    }
  }
}

My issue: I'm getting the expected results back, but without any highlighting. Any suggestions as to what I'm missing?

Hello,

Sorry for late response.

This should work as you expect and was able to test this locally. the highlight does come back in a seperate object within the hit:

{
        "_index": "example",
        "_id": "HjPKgo8BHA_1IaiKhJSl",
        "_score": 0.2876821,
        "_source": {
          "content": "hello",
        },
        "highlight": {
          "content": [
            "<em>hello</em>"
          ]
        }
      }

Joe

Hi, thanks for testing. An important detail is that I was doing this via the .NET search client (Elastic.Transport). Could that affect the results?

Hello and thanks again - your response made me realize something: I didn't see highlights because I was testing semantic search with ELSER, i.e. I was typing terms that don't exist in the text but that do have semantic matches. Typing in terms that directly existed in the indexed text, I was able to get highlighting of those terms as expected!

But as a follow up, do you know if there is any way to highlight semantic hits? As an example, if I search for "toxic" I get the document I'm looking for in return (I assume because it contains contextually similar terms like "contaminate", "discharged water" etc), but no highlighting because the word "toxic" is not actually in the document. Is it possible to highlight whatever the semantic search has indicated as contextually similar to what I searched for?

Best regards,
Silje

Something we have discussed internally but we don't unfortunately support it and not in near future. Not the easiest thing to do.

At a stretch, you could do this at the client side, tokenizing the text and finding words that are most similar to the query. Another could be chunking text into small passages (sentence or two) and embedding them. On query, surface the most similar passages for the query.

Joe

Thank you for the very helpful information! I'll look into your suggestions on embedding as well :slight_smile:

Best regards,
Silje

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.