Filebeat splits message after 16k

# filebeat 6.2.2
filebeat.prospectors:
-
  paths: [ "/var/lib/docker/containers/*/*-json.log" ]
  harvester_buffer_size: 65536
  json.message_key: log
  json.keys_under_root: true
  json.add_error_key: true
  fields_under_root: true
  processors:
    - add_docker_metadata: ~

harvester_buffer_size: 65536

output.file:
  path: "/tmp/filebeat"
  filename: filebeat
  • The docker container outputs a single line in valid json (about 18k)
  • The value of the "log" message key is also a single line in valid json
  • The message seems to be cut off at about 16k or a bit above (depends if you count the backslashes for escaping)
  • A second message gets created with the remaining part of the message including full decoration (docker meta data, additional fields etc)
  • Looks like filebeat splits the message into 2 separate ones
  • harvester_buffer_size has no effect
  • removing the json.* options has no effect
1 Like

Hi @paltryeffort,

We introduced the docker prospector, it handles JSON decoding and timestamp retrieval for you.

Could you confirm if this is still happening with it? Conf would look like this:

filebeat.prospectors:
- type: docker
  containers.ids:
    - "*"
  processors:
    - add_docker_metadata: ~

Hi @exekias ,
thanks for the fast response.
Using your config the message gets cut off at the exact same position.
I tried it with and without the harvester_buffer_size option.

We're having this exact same issue. Nothing I have tweaked seems to change this behavior. Interested to see what the solution is

1 Like

How to reproduce:
use my config, start a container and run the following:

#!/bin/bash
key="somerandomkey_"
value="somerandomvalue_"
echo -n '{'
for i in $(seq 420); do
  echo -n "\"${key}${i}\":\"${value}${i}\","
done
echo '"lastkey":"end"}'

This produces a valid json output. For me it cuts off at "somerand" and a new message continues with "omkey_396".

1 Like

Bump on this. @exekias is there anything else we can try to do to resolve this issue?

1 Like

issued added here, please comment on it with your experiences to get some traction going, its definitely a deal breaker!

Thanks for opening the issue. We will try to find some time to look into it.

Pretty urgent for us, may have to use else due to this, for example apps that generate stack traces or if you are consuming modsecurity audit logs, sizes of entries like this are pretty typical and w/ fbeats we are not sure how to re-correlate this data in logstash?

any suggestions from elastic for us and others dealing with this?

For everyone coming to this thread, discussion continues in https://github.com/elastic/beats/issues/6605

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.