Hello,
I'm upgrading from Elastic 6.8.1 to 7.8.1 (tested on 8.2.3 as well) and getting the following error:
startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards
It happens when I have a text with synonym and a word with delimiter char in it.
the change I had to do in my analyzer filters is moving the word_delimiter
from being before the synonyms filter to be after.
so now i have:
"analyzer": {
"stemmed_en": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"synonym_en",
"word_delimiter",
"stemmer_en"
]
}
In addition, I have a mapping for a content
field:
"content": {
"type": "text",
"analyzer": "stemmed_en",
"norms": false
}
For example, if I try to index the following document, and email has a synonym [email,e mail]:
POST index_name/_doc/12345
{
"content": "email abc@def.com"
}
I'm getting the following error:
"type": "illegal_argument_exception",
"reason": "startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=6,endOffset=17,lastStartOffset=14 for field 'content'"
Using the analyze API I get the following tokens:
{
"tokens": [
{
"token": "email",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
},
{
"token": "e",
"start_offset": 0,
"end_offset": 5,
"type": "SYNONYM",
"position": 0
},
{
"token": "abc",
"start_offset": 6,
"end_offset": 9,
"type": "word",
"position": 1
},
{
"token": "def",
"start_offset": 10,
"end_offset": 13,
"type": "word",
"position": 2
},
{
"token": "com",
"start_offset": 14,
"end_offset": 17,
"type": "word",
"position": 3
},
{
"token": "mail",
"start_offset": 6,
"end_offset": 17,
"type": "SYNONYM",
"position": 3
}
]
}
So its very clear that the synonym offset is getting mixed with the word delimiting.
can you help me understand whats the cause and how to fix it?
- note: I have also tried using
word_delimiter_graph
instead ofword_delimiter
but got the same error.
Thanks