Filebeat Implementation Details

Hi, I'm using filebeat for tons of servers in my Production environment and have some questions.

Does Filebeat compress logs while they're being streamed? We're trying to limit bandwidth usage between our servers and are trying to figure out the best way to compress logs. Is it possible to stream gzipped logs? If not, does anyone else have anything that allows for compressing on the fly?

When using load balancing filebeat to logstash, I can't seem to see anything that indicates that order of log lines is preserved. Is load balancing while preserving order of logs sent to ES impossible?

Thank you

Does Filebeat compress logs while they're being streamed?

I don't think so.

When using load balancing filebeat to logstash, I can't seem to see anything that indicates that order of log lines is preserved. Is load balancing while preserving order of logs sent to ES impossible?

ES itself doesn't maintain the order of inserted documents, so unless you have a field with a monotonically increasing integer (like the log file's line number or file offset) there is no ordering apart from the implied order given by the timestamp.

When using the Logstash output, the events are always compressed with zlib (at compression level 3). There is no compression when sending straight to Elasticsearch.

Is there any way to modify compression level so that we can experiment with it?

Unfortunately, not at this time. See the code below. The value 3 is hard-coded. You could open an enhancement request in the beats repo for this issue; or better yet, open a PR. :slight_smile:

hi,

  1. as already mentioned, when publishing to logstash, data will be gzipped using compression level 3. Compression level configuration is not supported yet, but I'm planning to add it soon.

  2. in general when using load-balancing, we can not guarantee any timely indexing order for some of these reasons:

  • after sending to two different logstash instances, one instance might outpace the other
  • if one logstash instance fails during load-balancing, lines have to be resend to another instance, which might have processed subsequent lines already

This does not imply your data are all out of order. The timestamp send by filebeat is the time, the line was read + when using grok to parse the timestamp from log-lines you get more exact timestamps. We also ship an offset (the file offset in bytes), which gives you some order information.