I recently began running into this:
"max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 338076"
Full error:
{"type"=>"illegal_argument_exception", "reason"=>"Document contains at least one immense term in field=\"some.field.keyword\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '...', original message: bytes can be at most 32766 in length; got 338076", "caused_by"={"type"=>"max_bytes_length_exceeded_exception", "reason"=>"max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 338076"}
This is on an ES cluster fairly recently upgraded to 5.x. I know that mapping changes to the new keyword type are reflected in the elasticsearch template that logstash installs, but found it curious that the ignore_above directive is no longer set.
elastic-logstash-template-es2x.json:
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fielddata" : { "format" : "disabled" },
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "doc_values" : true, "ignore_above" : 256}
}
}
}
elastic-logstash-template-es5x.json:
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text", "norms" : false,
"fields" : {
"keyword" : { "type": "keyword" }
}
}
}
I am now injecting my own template with the following change, which is more analogous to how things were working: fields properly mapped, no .keyword sub-field for items larger than 256 bytes, and therefore no aggregation on them (which is probably not desired anyway)
--- elastic-logstash-template-es5x.json 2017-01-24 20:14:18.000000000 +0000
+++ elastic-logstash-template-es5x-fixed.json 2017-03-14 13:48:51.542094141 +0000
@@ -23,7 +23,7 @@
"mapping" : {
"type" : "text", "norms" : false,
"fields" : {
- "keyword" : { "type": "keyword" }
+ "keyword" : { "type": "keyword", "ignore_above": 256 }
}
}
}
Was this an oversight, or a purposeful change? It seems to me this should still be default behavior.