Fast Vector Highlighter not working with query on synonym analyzer in 5.0.0-beta1 as in 2.4.0


(Alex Pang) #1

How to reproduce:

In 2.4.0 I created an index with the following settings and mappings:

PUT my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "synonym": {
          "format": "wordnet",
          "type": "synonym",
          "synonyms_path": "analysis/wordnet/english/wn_s.pl"
        }
      },
      "analyzer": {
        "synonym": {
          "type": "custom",
          "filter": [
            "synonym"
          ],
          "tokenizer": "standard",
          "ignore_case":"true"
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "dynamic_templates": [
        {
          "strings": {
            "mapping": {
              "docs": {
                "similarity": "BM25"
              },
              "term_vector": "with_positions_offsets",
              "type": "string",
              "fields": {
                "raw": {
                  "ignore_above": 4000,
                  "index": "not_analyzed",
                  "type": "string"
                }
              }
            },
            "match_mapping_type": "string"
          }
        }
      ]
    }
  }
}

Index settings and mapping In 5.0.0-beta1:

PUT my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "synonym": {
          "format": "wordnet",
          "type": "synonym",
          "synonyms_path": "analysis/wordnet/english/wn_s.pl"
        }
      },
      "analyzer": {
        "synonym": {
          "filter": [
            "synonym"
          ],
          "tokenizer": "standard",
          "ignore_case": "true"
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "dynamic_templates": [
        {
          "strings": {
            "mapping": {
              "term_vector": "with_positions_offsets",
              "type": "text",
              "fields": {
                "raw": {
                  "ignore_above": 4000,
                  "index": true,
                  "type": "keyword"
                }
              }
            },
            "match_mapping_type": "string"
          }
        }
      ]
    }
  }
}

Added document with:

PUT my_index/my_type/1
{
  "my_string":"the quick brown fox jumped over the lazy dog"
}

Search (for "fast"):

GET my_index/my_type/_search
{
  "query": {
    "match": {
      "my_string": {
        "query": "fast",
        "analyzer":"synonym"
      }
    }
  },
  "highlight": {
    "fields": {
      "my_string": {}
    }
  }
}

Result from Elasticsearch 2.4.0:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.0073345946,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.0073345946,
        "_source": {
          "my_string": "the quick brown fox jumped over the lazy dog"
        },
        "highlight": {
          "my_string": [
            "the <em>quick</em> brown fox jumped over the lazy dog"
          ]
        }
      }
    ]
  }
}

Result from Elasticsearch 5.0.0:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.27233246,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.27233246,
        "_source": {
          "my_string": "the quick brown fox jumped over the lazy dog"
        }
      }
    ]
  }
}

As shown above, the result is matched as expected, but the highlighter fails to highlight the synonym "quick" in 5.0.0.

The only difference in configuration is the dynamic template mapping for strings which uses "text" instead of "string" as noted in the breaking changes at https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking_50_mapping_changes.html#_literal_string_literal_fields_replaced_by_literal_text_literal_literal_keyword_literal_fields

Is something wrong with my query or configuration in 5.0.0?


(Alex Pang) #2

If I force the highlighter type to plain, the synonym gets highlighted in 5.0.0-beta1 (but then I can't highlight phrases).

Forcing the highlighter type to fast-vector-highlighter in 2.4.0 still highlights the synonym, so is there something different about the fast-vector-highlighter in 5.0?.


(Alex Pang) #3

Turns out it was a bug: https://github.com/elastic/elasticsearch/issues/20781

Looks like it's fixed in 5.0.0 GA.


(system) #4