I sometimes have very long strings that I don't want to analyze completely.
So I tested the truncate filter, but somehow it fails on some strings.
I use ES 2.4.
First, here is my custom analyzer:
"analysis": {
"analyzer": {
"analyzer_keyword": {
"filter": ["lowercase","customTruncateFilter"],
"tokenizer": "keyword"
}
},
"filter": {
"customTruncateFilter": {
"type":"truncate",
"length": 150
}
}
}
The following string is truncated correctly:
GET /advinion_chartsxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzziiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii.php HTTP/1.1
I.e. I get a match for "xxxx", but not for "iiii"
The following string (length=255) not:
Data Ascii: lGdj5WdmhCbhZXZ';function _0I0(data){var _1O0lOI="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";var o1,o2,o3,h1,h2,h3,h4,bits,i=0,enc='';do{h1=_1O0lOI.indexOf(data.charAt(i++));h2=_1O0lOI.indexOf(data.charAt(i++));h3=_1O0l
i.e. if I search for "1O0l", the document matches on this string
I suspect it is may be related to special chars in the string? (btw Scripting is off = default)
What is also weird is that without using the "truncate" filter, my cluster is smaller (68 vs. 70 GB)
I expected it to be smaller when truncating long strings.