Can Elasticsearch provide offsets of highlighted words?

astrodi · January 25, 2022, 9:49am

We are having OCRed text of PDFs searchable in Elasticsearch.
Also we have stored original PDFs, passing highlighted terms from Elasticsearch in URL and with custom library we highlight the words in PDFs.

But we would need to know offsets (positions) of highlighted words directly from Elasticsearch to support more fancy queries (proximity search).
Not all matches of given phrase are highlighted, just those which fulfill distance condition:

Example:

# index & document creation
PUT dominik_test_search/_doc/testing
{
  "content":{
    "DOC_TEXT":"""   Property damage covered under this insurance shall mean physical damage to the substance of property. 


   Physical damage to the substance of property shall not include corruption to data or software, in particular any 
   detrimental change in data, software or computer programs that is caused by a deletion, a corruption or a 
   deformation of the original structure.
   
   Property bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla damage bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla include bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla data"""
  }
}
# proxemity query
GET dominik_test_search/_search
{
  "query": {
    "query_string": {
      "default_field": "content.DOC_TEXT",
      "query": "\"property damage include data\"~10"
    }
  },
  "highlight": {
    "fields": {
      "content.DOC_TEXT": {
        "highlight_query": {
          "query_string": {
            "fields": [
              "content.DOC_TEXT"
            ],
            "query": "\"property damage include data\"~10"
          }
        },
        "type": "unified",
        "boundary_scanner": "sentence",
        "fragment_size": 1000,
        "number_of_fragments": 1,
        "no_match_size": 1000,
        "fragmenter": "span"
      }
    }
  }
}

Result:

Question:
Can Elasticsearch (by any way) provides offsets of highlighted words?
(we are using ES 7.16.2 in our clusters)

Thanks in advance

astrodi · January 31, 2022, 8:31am

any ideas anyone? Is it possible to adjust any of existing highlighters to obtain offsets for highlighted terms? I have with_positions_offsets enabled over highlighted fields.

astrodi · February 1, 2022, 10:36am

@Christian_Dahlqvist - can you refer someone from Elastic who might know? Thanks

Christian_Dahlqvist · February 1, 2022, 10:49am

This forum is manned by volunteers, some of which are employed by Elastic. Although it is an active forum, there is no guarantee of a response nor any SLAs. It is generally considered rude to ping people not already involved in the thread so please refrain from doing so.

If you do require guaranteed responses and SLAs, Elastic do sell commercial subscriptions that offer this.

astrodi · February 1, 2022, 11:18am

Sorry Christian, I was not aware of this rule (not ping people not involved in the thread).
Thanks for the answer anyway, will follow it from now

system · March 1, 2022, 11:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to fetch the offset of the highlighted word in the Elastic search result Elasticsearch	2	1014	July 6, 2017
Offsets of the highlighted terms Elasticsearch	1	338	July 6, 2017
How to fetch the offset of the highlighted word in the Elastic search result Elasticsearch	1	394	July 6, 2017
Fetching position of keyword in matched document Elasticsearch	6	8104	August 26, 2017
Getting offsets of a keyword in documents Elasticsearch	3	501	February 27, 2017

Can Elasticsearch provide offsets of highlighted words?

Related topics