No highlighting with intervals/fuzzy queries

In a recent feature request, the intervals query was extended to support fuzzy rules.

I'm now trying to figure out why highlighting doesn't work with intervals/fuzzy whereas it does work with intervals/match and span_near/fuzzy.

Let's index a sample document:

POST test-index/_doc/1
{
  "text": "The quick brown fox jumps over the lazy dog"
}

Using the span_near/fuzzy query, it works:

POST test-index/_search
{
  "highlight": {
    "number_of_fragments": 1,
    "fragment_size": 100,
    "fields": {
      "text": {
        "type": "unified"
      }
    }
  },
  "query": {
    "span_near": {
      "clauses": [
        {
          "span_multi": {
            "match": {
              "fuzzy": {
                "text": {
                  "fuzziness": "AUTO",
                  "value": "quick"
                }
              }
            }
          }
        },
        {
          "span_multi": {
            "match": {
              "fuzzy": {
                "text": {
                  "fuzziness": "AUTO",
                  "value": "lazy"
                }
              }
            }
          }
        }
      ],
      "slop": 5,
      "in_order": false
    }
  }
}

Response:

"hits" : [
  {
    "_index" : "test-index",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 0.119415164,
    "_source" : {
      "text" : "The quick brown fox jumps over the lazy dog"
    },
    "highlight" : {
      "text" : [
        "The <em>quick</em> brown fox jumps over the <em>lazy</em> dog"
      ]
    }
  }
]

Using the intervals/match query, it also works:

POST test-index/_search
{
  "highlight": {
    "number_of_fragments": 1,
    "fragment_size": 100,
    "fields": {
      "text": {
        "type": "unified"
      }
    }
  },
  "query": {
    "intervals": {
      "text": {
        "all_of": {
          "ordered": false,
          "max_gaps": 5,
          "intervals": [
            {
              "match": {
                "query": "quick"
                
              }
            },
            {
              "match": {
                "query": "lazy"
              }
            }
          ]
        }
      }
    }
  }
}

Response:

"hits" : [
  {
    "_index" : "test-index",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 0.14285713,
    "_source" : {
      "text" : "The quick brown fox jumps over the lazy dog"
    },
    "highlight" : {
      "text" : [
        "The <em>quick</em> brown fox jumps over the <em>lazy</em> dog"
      ]
    }
  }
]

Finally, we the intervals/fuzzy query, it doesn't work (i.e. no highlight section in the response):

POST test-index/_search
{
  "highlight": {
    "number_of_fragments": 1,
    "fragment_size": 100,
    "fields": {
      "text": {
        "type": "unified"
      }
    }
  },
  "query": {
    "intervals": {
      "text": {
        "all_of": {
          "ordered": false,
          "max_gaps": 5,
          "intervals": [
            {
              "fuzzy": {
                "term": "quick"
              }
            },
            {
              "fuzzy": {
                "term": "lazy"
              }
            }
          ]
        }
      }
    }
  }
}

Response:

"hits" : [
  {
    "_index" : "test-index",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 0.14285713,
    "_source" : {
      "text" : "The quick brown fox jumps over the lazy dog"
    }
  }
]

Would anyone have any idea why? @jimczi maybe?
Thank you so much!

@romseygeek can you take a look ?

Hi @val, can you tell me which version of ES you're using?

Hi @AlanWoodward, thanks for dropping by.

Indeed, I forgot to mention the version. I'm using 7.6.1

This should be fixed in the upcoming 7.7 release; the issue is how the fuzzy intervals source is implementing its visit() method - in 7.6 we are using a version of lucene that doesn't properly support visiting fuzzy automata, but this has been fixed in lucene 8.5 (LUCENE-9212) which will be used in elasticsearch 7.7.

Fantastic, thanks for the insights @AlanWoodward !!
Eager to test this in ES 7.7 soon.

I'm happy to share that ES 7.7.0 (with Lucene 8.5) took care of this issue.