Simple index using word delimiter resulting in startOffset error

Hi All,

I am trying to upgrade from Elastic 5 to Elastic 6.
I have a very simple index which uses shingles, word_delimiter and edge.

My example is noted below, I am trying to index "Bat man is cool".
I want to be able to allow customers to search for "batman" so I am using word delimiter and if they enter "batm" I want to be able to match this, so I also use edge grams.

The below works in ES5, but in ES6, it no longer works.

PUT /test
{
"settings": {
"analysis": {
"filter": {
"word_joiner": {
"type": "word_delimiter",
"catenate_all": true
},
"edge": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10
},
"remove_plurals": {
"type": "stemmer",
"name": "minimal_english"
},
"shingles": {
"type": "shingle",
"max_shingle_size": 5,
"min_shingle_size": 2,
"output_unigrams": "true"
}
},
"analyzer": {
"default": {
"tokenizer": "standard",
"filter": [
"lowercase",
"shingles",
"word_joiner",
"edge"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"field1": {
"type": "text",
"analyzer": "default"
}
}
}
}
}

POST /_bulk
{ "update" : { "_index" : "test", "_type" : "test", "_id" : "1" } }
{ "doc": { "field1" : "Bat man is cool"}, "doc_as_upsert": true}

Error:
startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=3,lastStartOffset=4 for field 'field1'

Can anyone suggest a reason or help?
You can enter the code above (the PUT and POST) on ES6 and recreate the issue very quickly.

Thank you for your help in advance,

Dev

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.