For example, I have next settings:
PUT test_search_05
{
"settings": {
"analysis": {
"analyzer": {
"shingle": {
"tokenizer": "standard",
"filter": [ "my_shingle_filter" ]
}
},
"filter": {
"my_shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 5,
"output_unigrams": false
}
}
}
}
}
Next, analyze
GET test_search_05/_analyze
{
"analyzer": "shingle",
"text": "quick brown fox"
}
I see:
{
"tokens" : [
{
"token" : "quick brown",
"start_offset" : 0,
"end_offset" : 11,
"type" : "shingle",
"position" : 0
},
{
"token" : "quick brown fox",
"start_offset" : 0,
"end_offset" : 15,
"type" : "shingle",
"position" : 0,
"positionLength" : 2
},
{
"token" : "brown fox",
"start_offset" : 6,
"end_offset" : 15,
"type" : "shingle",
"position" : 1
}
]
}
There are wrong positionLength in all tokens (one less).
Without {"output_unigrams": false} it works correctly.
How can I resolve that problem?