I want to use a whitespace tokenizer, but I have very long "tokens", and if it is longer than 255 characters, it will split it into 2 or more tokens.
GET _analyze
{
"tokenizer": "whitespace",
"filter": [
{
"type": "length",
"min": 0,
"max": 50
}
],
"text": "abcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmn123456789abcd"
}
and the result is:
{
"tokens" : [
{
"token" : "abcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmnopqrstuwyzabcdefghijklmn1",
"start_offset" : 0,
"end_offset" : 255,
"type" : "word",
"position" : 0
},
{
"token" : "23456789abcd",
"start_offset" : 255,
"end_offset" : 267,
"type" : "word",
"position" : 1
}
]
}
How can I increase number of characters to be more than 255, tnx?