Hi, I'm trying to use the limit token filter to cap the amount of data stored in elasticsearch from a potentially very large text string in my main database.
I have in my settings this filter and analyzer specified:
analysis": {
"filter": {
"max_size_tokens": {
"type": "limit",
"max_token_count": "50"
}
},
"analyzer": {
"large_text_blobs": {
"filter": [
"lowercase",
"unique",
"max_size_tokens",
],
"type": "custom",
"tokenizer": "standard"
}
}
but analyzing a long string is returning more than the 50 tokens I specified. (Example string is just the numbers 1 - 200, with comma and space between; it returns all 200 tokens.)
BTW this is on elasticsearch 5.2.1, running locally, and the consume_all_tokens option does not appear to have any effect on this.