Filebeat: output to kafka checks message size before compression

According to filebeat document of kafka output for key max_message_bytesedit:

The maximum permitted size of JSON-encoded messages. Bigger messages will be dropped. The default value is 1000000 (bytes). This value should be equal to or less than the broker’s message.max.bytes .

Then looking at Kafka's documentation of broken setting message.max.bytes:

The largest record batch size allowed by Kafka (after compression if compression is enabled).

So, based on the above two, I would consider the size limit in filebeat configuration is for compressed data, as for kafka.

So I would assume that if I set:

  • On kafka size, message.max.bytes to more than 1 million bytes (i.e. for example the default value)
  • On filebeat side, the following configuration:
# Logging configuration
logging.level: error
logging.to_stderr: true
logging.to_files: false
logging.json: true

# Inputs
filebeat.inputs:

- type: log
  paths:
  - "some_file_with_json_lines.log"
  encoding: utf-8
  json:
    keys_under_root: false
    add_error_key: true
  fields:
    topic: test_data
  fields_under_root: true

# Outputs
output.kafka:
  hosts: ["localhost:9092"]
  version: 2.0.0
  topic: '%{[topic]}'
  key: '%{[json.key]}'
  client_id: 'data_filebeat'
  compression: gzip
  max_message_bytes: 1000000
  required_acks: -1
  worker: 1

and then have a file with json data (having a json.key field), I would then expect to be able to have messages of more than 1 million bytes processed since they would be compressed with gzip level 4 compression (so from my tests with gzip command line, something around 150k for a 1.8MB line) and be less than the 1MB limits set in filebeat and in kibana.

But it is not the case, I get the following error:

  • filebeat 6.5.4:
{"level":"error","timestamp":"2021-07-26T14:11:48.640+0200","caller":"kafka/client.go:234","message":"Kafka (topic=test_data): dropping too large message of size 1893795."}
  • filebeat 7.13.4:
{"level":"error","timestamp":"2021-07-26T14:27:29.783+0200","logger":"kafka","caller":"kafka/client.go:345","message":"Kafka (topic=test_data): dropping too large message of size 1893898."}

But then changing the filebeat max_message_byte configuration, it works. Actually, if still fails if I put the max size of 1893000 (a bit below the indicated size of my huge entry in data file) and works if I put the max size of 1894000 (a bit above the indicated size in my huge entry in data file).

Conclusion: there are two ways to approach this observation:

  • consider that max_message_byte has always been "size before compression" and change the filebeat documentation to indicate for example:

The maximum permitted size of JSON-encoded messages before compression. Bigger messages will be dropped. The default value is 1000000 (bytes). This value should be equal to or less than the broker’s message.max.bytes when no compression is enabled.

This little change would then make it a bit clearer.

  • or consider that documentation is right and update filebeat (lib Sarama?) to have the message size evaluated after compression only.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.