ELK - guaranteed lossless?

(Dan) #1

Hi folks, I'm trying to understand what the best setup is to guarantee zero lost messages.

I know logstash will not purposely lose messages, however I have seen references of people pulling in Redis to ensure they don't lose messages, which seems like an odd requirement. I thought in this configuration redis is just acting like a bigger 'insert' buffer into Elasticsearch, since it can get slow when dealing with lots of inserts (I'm assuming elasticsearch itself will not drop inserts if its under heavy load and instead just gets slow).

I read that logstash uses a 20 message buffer, broken into 3 phases. (equalling a total of 60 messages in the pipeline).
Question: if logstash crashes with a full pipeline, when it comes back online will it pick up those messages again? or does it treat those messages as already processed?


(Mark Walkom) #2

Currently it'll lose them, 2.0 will have persistence though.

(Suyog Rao) #3

I'll address Logstash's message delivery concerns here. Elasticsearch is out of scope for this, but you can easily find our current progress and work here: https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html

If you have specific questions for Elasticsearch, we can address them in Elasticsearch category in discuss

@warkolm is correct. We are working on making LS resilient to crashes w.r.t. message losses. Today, LS does not offer any message guarantees like at least once or at most once. The software's core philosophy is to not lose messages intentionally, so we try hard to fix bugs which result in crashes. For this reason, the internal buffer between the different stages is capped to 20 events. So at most, there are 40 events in-flight (in memory) in LS that can be lost when there is a hard crash. Using certain message brokers between the LS stages (shippers and indexers) can mitigate this message loss. For example, if you use Apache Kafka, it provides a way to replay messages which were not committed to Zookeeper (using the LS input). LS 1.5 natively supports Kafka, so this is an option.

For 2.0, we are working on persisting these in-flight messages to disk, so they can be recovered after a hard crash

You can track our work regarding this using the reliability label: https://github.com/elastic/logstash/labels/resiliency

(Suyog Rao) #4

Correct, but there are only 2 queues: input -> filter, filter -> output of capacity 20.

Not necessarily. Message queues can introduce latency when you are indexing events, but not by a big margin. Also if message loss is a concern, you need to work with some latency.

Check our documentation here for this topic: https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html#deploying-message-queueing

(Dan) #5

Thank You all for quick responses

One follow up question, I know in some of our previous usage (we are using the multi-line plugin), we found some super large data rows in elasticsearch. Is there a way to put a hard cap on message length, to ensure:

  1. Logstash does not attempt to say process a massive message that 'cannot fit into memory'
  2. prevent logstash from attempting to insert a 'currupted' i.e. too long of a message into elasticsearch.

(Suyog Rao) #6

Specifically for multiline related plugins (filter and codec) we have the max_lines or max_bytes option so you don't run out of memory. https://www.elastic.co/guide/en/logstash/current/plugins-codecs-multiline.html#plugins-codecs-multiline-max_lines

In general, if you want to drop events (or individual fields) which are too big you can use the range filter. https://www.elastic.co/guide/en/logstash/current/plugins-filters-range.html#plugins-filters-range-ranges

(system) #7