I am using Filebeat, Logstash, and Elasticsearch to collect logs.
I have received a query that some logs are not being captured by Elasticsearch.
Is it possible to lose logs in this ELK server configuration?
For example, if the server with Filebeat or Logstash is under heavy load.
Or is there a time limit for when delays occur in log delivery? Is there a specification such that if the logs cannot be delivered for a certain period of time, they will be discarded?
Yes, I agree with you. It is the same issue from the purpose of preventing loss of logs.
However, the former asks if the logs can be lost if the server goes down.
Here I am asking what are the cases of losing logs.
For example, are there cases that time out?
If so, we would like to know if there is a setting that would extend the timeout period.
Yes, there is, but it is not exactly a time out, it is related to the event queue.
Both Filebeat and Logstash have a queue where the events are buffered, per default this queue is a memory queue in both cases.
If Elasticsearch becomes unavailable, the queue will start to fill up and once the queue is full no new events will be accepted. Depending on your source, this can lead to the loss of data.
For example, if you are consuming logs from a file and this file rotates during this period, this can lead to the loss of data, or if you are receiving events from TCP/UDP ports this can also lead to the loss of data.
A way to help with problems like this is to increase the queue size, with a memory base queue, this will mean more memory usage, but you can switch to a disk based queue where the buffer of events will be written on disk.
Other option is to use a message queue like Kafka.
For Logstash I have already set the queue.type: persisted.
Please let me know if there is any indicator about the size of the queue.
The ELK servers we are using has 32GB memory and 4TB disks in a cluster configuration.
I see that there is a setting for Filebeat as well. Thanks for letting me know.
My concern is that increasing the size of the queue used by Filebeat will lead to a decrease in the performance of the server?
I don't want to overload the server from which the logs are delivered as much as possible.
It is all in the documentation, check it here, you can configure the size and the max number of events, per default it accept unlimited events but is limited to 1 GB of size, if you need to increase it depends entirely on your use case, note that your disk must have free space to save the queue and that this configuration is per pipeline, if you have 3 pipelines in pipelines.yml, then by default you need 3 GB of space.
Yes, both Logstash and Filebeat disk queues can impact the server performance and the overall ingesting rate. When using disk queues the events received by the input will be written to the disk and then read from the disk, so you will have a lot of I/O usage for those read/write, if your disk is not fast enough, this can lead to a performance degradation.
This is also explained in the documentation, please check the documentation links.
The disk queue stores pending events on the disk rather than main memory. This allows Beats to queue a larger number of events than is possible with the memory queue, and to save events when a Beat or device is restarted. This increased reliability comes with a performance tradeoff, as every incoming event must be written and read from the device’s disk. However, for setups where the disk is not the main bottleneck, the disk queue gives a simple and relatively low-overhead way to add a layer of robustness to incoming event data.
It is not possible to know if this will impact your use case or not, you will need to test and validate it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.