Help identifying large message that is causing a Kafka RecordTooLargeException

Hello!

I'm running Logstash 6.6.0 on each node in Kubernetes cluster. The agent is using a file input plugin to read logs from all containers running on the host. I'm then using a Kafka output plugin to send these messages out to Kafka. After a few hours I'm receiving the following error on every agent in my cluster:

[2019-03-05T17:19:58,758][WARN ][logstash.outputs.kafka   ] KafkaProducer.send() failed: org.apache.kafka.common.errors.RecordTooLargeException: The message is 56190054 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration. {:exception=>java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The message is 56190054 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.}
[2019-03-05T17:19:58,758][WARN ][logstash.outputs.kafka   ] KafkaProducer.send() failed: org.apache.kafka.common.errors.RecordTooLargeException: The message is 16795508 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration. {:exception=>java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The message is 16795508 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.}
[2019-03-05T17:19:58,758][WARN ][logstash.outputs.kafka   ] KafkaProducer.send() failed: org.apache.kafka.common.errors.RecordTooLargeException: The message is 49519247 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration. {:exception=>java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The message is 49519247 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.}
[2019-03-05T17:19:58,758][WARN ][logstash.outputs.kafka   ] KafkaProducer.send() failed: org.apache.kafka.common.errors.RecordTooLargeException: The message is 21149866 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration. {:exception=>java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The message is 21149866 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.}
[2019-03-05T17:19:58,758][INFO ][logstash.outputs.kafka   ] Sending batch to Kafka failed. Will retry after a delay. {:batch_size=>4, :failures=>4, :sleep=>0.1}

If I'm reading those errors correctly it's saying the events are 56mb, 16mb, 49mb, and 21mb in size. That doesn't seem right. I've tried restarting the logstash agent and it seems to work fine for a few hours and then starts throwing these errors again. Is there any way I can inspect the messages it's trying to send? So I can figure out what I need to fix?

Thanks!

Still looking for some help here.. Here is some more details:

My logstash agent throws the following error while parsing Docker logs from Kubernetes and then starts complaining about not being able to send giant messages to Kafka (I had to shorten the initial error message because it went over the character limit for this forum.)

[2019-03-09T03:14:02,904][ERROR][logstash.codecs.json     ] 0\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0
000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000<TRUNCATED A BUNCH MORE OF "\u0000\" TO FIT INTO THE elastic.co FORUM>{\"log\":\"2019-03-09 11:14:02.017 [INFO][84] health.go 121
: Overall health summary=\\u0026health.HealthReport{Live:true, Ready:true}\\n\",\"stream\":\"stdout\",\"time\":\"2019-03-09T11:14:02.017969539Z\"}"}
[2019-03-09T03:14:06,516][WARN ][logstash.outputs.kafka   ] KafkaProducer.send() failed: org.apache.kafka.common.errors.RecordTooLargeException: The message is 27197885 bytes when serialized which is larger than the maximum request size
 you have configured with the max.request.size configuration. {:exception=>java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The message is 27197885 bytes when serialized which is larger th
an the maximum request size you have configured with the max.request.size configuration.}
[2019-03-09T03:14:06,519][INFO ][logstash.outputs.kafka   ] Sending batch to Kafka failed. Will retry after a delay. {:batch_size=>1, :failures=>1, :sleep=>0.1}

Here are what my configs look like:

input {
  file {
    path  => "/var/log/containers/*.log"
    codec => json {
      charset => "UTF-8"
    }
    tags  => ["kubernetes", "docker"]
  }
}
filter {
    mutate {
      rename => { "log" => "message" } # move the log field to message
      strip  => "message" # removes trailing "\n" that docker puts on all log messages
    }
    date {
      match        => ["[time]", "ISO8601"]
      remove_field => ["[time]"]
    } 
}
output {
    kafka {
      bootstrap_servers => "kafkaserver:9092"
      codec             => "json"
      topic_id          => "logstash"
      message_key       => "k8snode_%{type}" 
      compression_type  => "gzip"
    }
}

Thanks for any input!

I'm kind of disappointed I got zero help or even a comment here from anyone who works at elastic...
</end rant>

For anyone who stumbles upon this in the future, I did a bunch of experimentation and found that there is some sort of issue with Docker rotating it's own logs. Currently when enabling log rotation within docker it's self, docker will start the newly rotated log off with a few megabytes or more of repeating "\x00" characters. As a result the log will shows up as a binary mime type. So when logstash ingests and then tries to json parse the file, you will see tons and tons of "logstash.codecs.json" error messages in your logs. And in my case Logstash will create an event that is sometimes in the 10's of megabytes in size and won't be able to output to a Kafka cluster due to being just too large.

I was able to work around this by configuring Docker to no longer rotate it's own logs, and instead use logrotate.d for log rotation.

For anyone interested here is the Docker bug which was the root cause of this issue:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.