Hy, I have few services that create a lot of logs and I want to start using ES + logfile to parse them. Now, since once logs are parsed and send to ES I don't need them(and since I'm forced to use on premise server with limited hard disk it shuts down server and application is turned off - which can't happen! when docker logs fill all of serves hard disk). I limit docker logs in odcker compose to 50MB( max-size: "50m"). My question is will this distort filebeats is harvesting interval is small enough? In other words, will I miss any logs in ES because of limiting docker logs size?
When your container's log file crosses 50 MB in size, it will get truncated. As with any truncation scenario, log lines that Filebeat hasn't "seen" yet might be lost. Your best bet would bet to set backoff to something very low to ensure that Filebeat aggressively tries to look for new log lines, which increases its chances of staying caught up before truncation occurs.
Of course, once the file is truncated, Filebeat will detect this and start from the beginning of the file.
Thanks for info! But lets say if I limit to 50MB I will always have logs for at least last 5 minutes(log quantity is very predictable) and if I set backoff to 1s...will I get duplicates in ES? Duplication is not the exact term...what I basicly mean is multiplicarion of same log because log file will be truncated and filebeat will pick it up. Because if thats true I am afraid I can't use it. Where can I read more about it(for person not familiar with go language)?
Filebeat internally maintains a "byte offset" of where to start reading from in a file. Initially that is set to 0. As Filebeat consumes log lines from the file, that byte offset gets incremented. When a file is truncated, Filebeat detects that scenario and resets the byte offset to 0.
With a sufficiently low (i.e. aggressive) backoff Filebeat will be reading log lines as quickly as it can, and incrementing this internal byte offset. Ideally, most of the time the byte offset will be at EOF, that is Filebeat is completely caught up. When the file gets truncated, the byte offset will be reset to 0 and Filebeat will start reading log lines from the top of the file again.
So I'm not sure how we would end up with duplicates in ES, but perhaps I'm missing something so please feel free to point it out!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.