Error when indexing document and analysing with "word delimiter" in ES 6.0.0-rc1

PUT foo
    {
      "index": {
        "analysis": {
          "filter": {
            "starts_with":    {"type": "edgeNGram", "max_gram": 50, "min_gram": 1},
            "contains":       {"type": "nGram", "max_gram": 50, "min_gram": 1},
            "catenate_words": {"type": "word_delimiter", "catenate_all": true, "preserve_original": true}
          },
          "analyzer": {
            "name_analyzer": {
              "tokenizer": "whitespace",
              "filter":    ["lowercase", "contains", "starts_with", "catenate_words"],
              "type":      "custom"
          }
         }
        }
      }
    }

PUT foo/_mapping/type
{
  "properties": {
    "name": {"type": "text", "analyzer": "name_analyzer"}
  }
}

PUT foo/type/1
{
  "name": "SD500"}
}

This will produce the following error message:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=5,lastStartOffset=2 for field 'name'"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=5,lastStartOffset=2 for field 'name'"
  },
  "status": 400
}

Word delimiter prevents documents from being indexed if they have fields (with type: text) which contain characters and numbers in the same word, e.g. "SD500".

Indexing "SD 500" returns no errors.

This works with previous 5.x versions of ElasticSearch, but not with 6.0.0-rc1.

I can't find anything under the breaking changes part of the documents that is related to this.

1 Like

I am not a hundred percent sure what your intention here is?

Does it make a lot of sense to put ngram and edgengram filters into the same field instead of a multi field? I fail to see the use-case here.

Maybe explain what you want to do from a user perspective a bit more verbose without any Elasticsearch mapping configuration?

--Alex

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.