No highlighting with intervals/fuzzy queries

In a recent feature request, the intervals query was extended to support fuzzy rules.

I'm now trying to figure out why highlighting doesn't work with intervals/fuzzy whereas it does work with intervals/match and span_near/fuzzy.

Let's index a sample document:

POST test-index/_doc/1
{
  "text": "The quick brown fox jumps over the lazy dog"
}

Using the span_near/fuzzy query, it works:

POST test-index/_search
{
  "highlight": {
    "number_of_fragments": 1,
    "fragment_size": 100,
    "fields": {
      "text": {
        "type": "unified"
      }
    }
  },
  "query": {
    "span_near": {
      "clauses": [
        {
          "span_multi": {
            "match": {
              "fuzzy": {
                "text": {
                  "fuzziness": "AUTO",
                  "value": "quick"
                }
              }
            }
          }
        },
        {
          "span_multi": {
            "match": {
              "fuzzy": {
                "text": {
                  "fuzziness": "AUTO",
                  "value": "lazy"
                }
              }
            }
          }
        }
      ],
      "slop": 5,
      "in_order": false
    }
  }
}

Response:

"hits" : [
  {
    "_index" : "test-index",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 0.119415164,
    "_source" : {
      "text" : "The quick brown fox jumps over the lazy dog"
    },
    "highlight" : {
      "text" : [
        "The <em>quick</em> brown fox jumps over the <em>lazy</em> dog"
      ]
    }
  }
]

Using the intervals/match query, it also works:

POST test-index/_search
{
  "highlight": {
    "number_of_fragments": 1,
    "fragment_size": 100,
    "fields": {
      "text": {
        "type": "unified"
      }
    }
  },
  "query": {
    "intervals": {
      "text": {
        "all_of": {
          "ordered": false,
          "max_gaps": 5,
          "intervals": [
            {
              "match": {
                "query": "quick"
                
              }
            },
            {
              "match": {
                "query": "lazy"
              }
            }
          ]
        }
      }
    }
  }
}

Response:

"hits" : [
  {
    "_index" : "test-index",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 0.14285713,
    "_source" : {
      "text" : "The quick brown fox jumps over the lazy dog"
    },
    "highlight" : {
      "text" : [
        "The <em>quick</em> brown fox jumps over the <em>lazy</em> dog"
      ]
    }
  }
]

Finally, we the intervals/fuzzy query, it doesn't work (i.e. no highlight section in the response):

POST test-index/_search
{
  "highlight": {
    "number_of_fragments": 1,
    "fragment_size": 100,
    "fields": {
      "text": {
        "type": "unified"
      }
    }
  },
  "query": {
    "intervals": {
      "text": {
        "all_of": {
          "ordered": false,
          "max_gaps": 5,
          "intervals": [
            {
              "fuzzy": {
                "term": "quick"
              }
            },
            {
              "fuzzy": {
                "term": "lazy"
              }
            }
          ]
        }
      }
    }
  }
}

Response:

"hits" : [
  {
    "_index" : "test-index",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 0.14285713,
    "_source" : {
      "text" : "The quick brown fox jumps over the lazy dog"
    }
  }
]

Would anyone have any idea why? @jimczi maybe?
Thank you so much!

@romseygeek can you take a look ?

Hi @val, can you tell me which version of ES you're using?

Hi @AlanWoodward, thanks for dropping by.

Indeed, I forgot to mention the version. I'm using 7.6.1

This should be fixed in the upcoming 7.7 release; the issue is how the fuzzy intervals source is implementing its visit() method - in 7.6 we are using a version of lucene that doesn't properly support visiting fuzzy automata, but this has been fixed in lucene 8.5 (LUCENE-9212) which will be used in elasticsearch 7.7.

1 Like

Fantastic, thanks for the insights @AlanWoodward !!
Eager to test this in ES 7.7 soon.

I'm happy to share that ES 7.7.0 (with Lucene 8.5) took care of this issue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.