Can Elasticsearch provide offsets of highlighted words?

We are having OCRed text of PDFs searchable in Elasticsearch.
Also we have stored original PDFs, passing highlighted terms from Elasticsearch in URL and with custom library we highlight the words in PDFs.

But we would need to know offsets (positions) of highlighted words directly from Elasticsearch to support more fancy queries (proximity search).
Not all matches of given phrase are highlighted, just those which fulfill distance condition:

Example:

# index & document creation
PUT dominik_test_search/_doc/testing
{
  "content":{
    "DOC_TEXT":"""   Property damage covered under this insurance shall mean physical damage to the substance of property. 


   Physical damage to the substance of property shall not include corruption to data or software, in particular any 
   detrimental change in data, software or computer programs that is caused by a deletion, a corruption or a 
   deformation of the original structure.
   
   Property bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla damage bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla include bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla data"""
  }
}
# proxemity query
GET dominik_test_search/_search
{
  "query": {
    "query_string": {
      "default_field": "content.DOC_TEXT",
      "query": "\"property damage include data\"~10"
    }
  },
  "highlight": {
    "fields": {
      "content.DOC_TEXT": {
        "highlight_query": {
          "query_string": {
            "fields": [
              "content.DOC_TEXT"
            ],
            "query": "\"property damage include data\"~10"
          }
        },
        "type": "unified",
        "boundary_scanner": "sentence",
        "fragment_size": 1000,
        "number_of_fragments": 1,
        "no_match_size": 1000,
        "fragmenter": "span"
      }
    }
  }
}

Result:

Question:
Can Elasticsearch (by any way) provides offsets of highlighted words?
(we are using ES 7.16.2 in our clusters)

Thanks in advance

any ideas anyone? Is it possible to adjust any of existing highlighters to obtain offsets for highlighted terms? I have with_positions_offsets enabled over highlighted fields.

@Christian_Dahlqvist - can you refer someone from Elastic who might know? Thanks

This forum is manned by volunteers, some of which are employed by Elastic. Although it is an active forum, there is no guarantee of a response nor any SLAs. It is generally considered rude to ping people not already involved in the thread so please refrain from doing so.

If you do require guaranteed responses and SLAs, Elastic do sell commercial subscriptions that offer this.

Sorry Christian, I was not aware of this rule (not ping people not involved in the thread).
Thanks for the answer anyway, will follow it from now

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.