Incorrect highlight when using Intervals query (Elastic 7.1.1)

Hello!
Please, clarify how intervals query works with highlighting in Elastic 7.1.1?

I researched features of interval query.
Mapping:
PUT test_intervals_index

{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "issue_date": { "format": "epoch_second", "type": "date" },
      "title": { "type": "text" },
      "content": { "type": "text" }
    }
  }
}

I executed query using intervals query and highlighting :

POST test_intervals_index/_search

{
    "query": {
    "intervals": {
            "content": {
                "match": { "query": "new library", "max_gaps": 0, "ordered": true }
            }
        }
    },
    "highlight": {
        "fields": {
            "content": { "number_of_fragments": 0 }
        }
    }    
}

But hightlight object containts hightlighting phrases "new library" and also separate words from search phrase, i.e. "library" (that do not have a word "new" near them).

For example, result of the query is:

{
  "took": 334,
  "timed_out": false,
  "_shards": {
    "total": 9,
    "successful": 9,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.5,
    "hits": [
      {
        "_index": "test_intervals_index",
        "_type": "_doc",
        "_id": "HtPUkmsB7JMoGnHxDfao",
        "_score": 0.5,
        "_source": {
          "issue_date": "1561507200",
          "title": "System.IO.Pipelines: High performance IO in .NET",
          "content": "System.IO.Pipelines is a new library that is designed to make it easier to do high performance IO in .NET. It’s a library targeting .NET Standard that works on all .NET implementations. Pipelines was born from the work the .NET Core team did to make Kestrel one of the fastest web servers in the industry. What started as an implementation detail inside of Kestrel progressed into a re-usable API that shipped in 2.1 as a first class BCL API (System.IO.Pipelines) available for all .NET developers."
        },
        "highlight": {
          "content": [
            "System.IO.Pipelines is a <em>new</em> <em>library</em> that is designed to make it easier to do high performance IO in .NET. It’s a <em>library</em> targeting .NET Standard that works on all .NET implementations. Pipelines was born from the work the .NET Core team did to make Kestrel one of the fastest web servers in the industry. What started as an implementation detail inside of Kestrel progressed into a re-usable API that shipped in 2.1 as a first class BCL API (System.IO.Pipelines) available for all .NET developers."
          ]
        }
      }
    ]
  }
}

Is this behavior correct? Or maybe I need to add some options for hightlight-query?

Unfortunately our highlighters don't currently handle intervals correctly, they fall back to just extracting individual terms and highlighting those. We're working on an improved highlighter that will use the underlying lucene Matches API, but it's a way off yet.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.