PUT foo
{
"index": {
"analysis": {
"filter": {
"starts_with": {"type": "edgeNGram", "max_gram": 50, "min_gram": 1},
"contains": {"type": "nGram", "max_gram": 50, "min_gram": 1},
"catenate_words": {"type": "word_delimiter", "catenate_all": true, "preserve_original": true}
},
"analyzer": {
"name_analyzer": {
"tokenizer": "whitespace",
"filter": ["lowercase", "contains", "starts_with", "catenate_words"],
"type": "custom"
}
}
}
}
}
PUT foo/_mapping/type
{
"properties": {
"name": {"type": "text", "analyzer": "name_analyzer"}
}
}
PUT foo/type/1
{
"name": "SD500"}
}
This will produce the following error message:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=5,lastStartOffset=2 for field 'name'"
}
],
"type": "illegal_argument_exception",
"reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=5,lastStartOffset=2 for field 'name'"
},
"status": 400
}
Word delimiter prevents documents from being indexed if they have fields (with type: text) which contain characters and numbers in the same word, e.g. "SD500".
Indexing "SD 500" returns no errors.
This works with previous 5.x versions of ElasticSearch, but not with 6.0.0-rc1.
I can't find anything under the breaking changes part of the documents that is related to this.