Hi!
I have the following error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "startOffset must be non-negative, and endOffset must be >= startOffset; got startOffset=14,endOffset=13"
}
],
"type": "illegal_argument_exception",
"reason": "startOffset must be non-negative, and endOffset must be >= startOffset; got startOffset=14,endOffset=13"
},
"status": 400
}
when running the following call:
POST /_ml/trained_models/.multilingual-e5-small_linux-x86_64/_infer
{
"docs": {
"text_field": "bla bla bla ⅓ bla bla bla"
}
}
I figured out the error came because of the '⅓' symbol.
Is there any way in my pipeline that I can filter such symbols out? so only the characters the model understand go through?
I am using Elastic Cloud V8.13.3 for this test.
I was originally using the _update_by_query
api to update my index when coming across this error. Some feedback from me; make it easier to see that this error was coming from the ML job/ pipeline. Took me a bit to figure out it was because of the symbol.
Kind regards,
Chenko