Immense term and maximum length exception in elasticsearch output

I recently began running into this:

"max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 338076"

Full error:

{"type"=>"illegal_argument_exception", "reason"=>"Document contains at least one immense term in field=\"some.field.keyword\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '...', original message: bytes can be at most 32766 in length; got 338076", "caused_by"={"type"=>"max_bytes_length_exceeded_exception", "reason"=>"max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 338076"}

This is on an ES cluster fairly recently upgraded to 5.x. I know that mapping changes to the new keyword type are reflected in the elasticsearch template that logstash installs, but found it curious that the ignore_above directive is no longer set.

elastic-logstash-template-es2x.json:

"string_fields" : {
          "match" : "*",
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "string", "index" : "analyzed", "omit_norms" : true,
            "fielddata" : { "format" : "disabled" },
            "fields" : {
              "raw" : {"type": "string", "index" : "not_analyzed", "doc_values" : true, "ignore_above" : 256}
            }
          }
        }

elastic-logstash-template-es5x.json:

"string_fields" : {
  "match" : "*",
  "match_mapping_type" : "string",
  "mapping" : {
    "type" : "text", "norms" : false,
    "fields" : {
      "keyword" : { "type": "keyword" }
    }
  }
}

I am now injecting my own template with the following change, which is more analogous to how things were working: fields properly mapped, no .keyword sub-field for items larger than 256 bytes, and therefore no aggregation on them (which is probably not desired anyway)

--- elastic-logstash-template-es5x.json	2017-01-24 20:14:18.000000000 +0000
+++ elastic-logstash-template-es5x-fixed.json	2017-03-14 13:48:51.542094141 +0000
@@ -23,7 +23,7 @@
           "mapping" : {
             "type" : "text", "norms" : false,
             "fields" : {
-              "keyword" : { "type": "keyword" }
+              "keyword" : { "type": "keyword", "ignore_above": 256 }
             }
           }
         }

Was this an oversight, or a purposeful change? It seems to me this should still be default behavior.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

This was ultimately tracked in https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/588, and fixed with https://github.com/logstash-plugins/logstash-output-elasticsearch/pull/610, which did essentially the above.