filebeat follows files by inode. If a file is renamed due to rotation, it's detected by filebeat. Plus filebeat tries to keep files open, that is it can still read deleted files. But if you're constantly writting logs faster then filebeat can process in the time windows of logs being available, you're prone to potentially loosing logs or have filebeat keep open all logs until you run out of disk space.
Understood the possibilities of dataloss, but I'm not clear on...
stdout is original, it rolls over to stdout.1, filebeat is reading both correct?
Once the logs rollover to stdout.3 will it read all 3 or drop one of them?
Also is it better for filebeat to read 10 files of 10MB each or 1 file of 100MB?
It depends. You glob pattern uses stdout only. That is, by default if stdout is rotated to stdout.1, filebeat will continue reading from the already opened file handled + starts a new harvester for stdout. And so on and so on. If filebeat is restarted in between, I won't be able to find stdout.1, due to your config not asking for stdout.1 being processed. Better have your pattern end with stdout*. In this case filebeat can also continue old log files after being restarted. A file's identity is not the filename, but the inode. Logrotation normally updates the filename using move, which doesn't change the inode. The filename is only used to find files one wants to publish.
Also is it better for filebeat to read 10 files of 10MB each or 1 file of 100MB?
I don't have a general answer for this. More files means filebeat can process files more concurrently (drawback might be more seeks on hard drives). Too many files (10 files are not many) can slow down scanning for files + creates some additional congestion on the shared event queue. But the event queue holding the events produced by each file reader is the same, no matter how many files you have. Normally the bottleneck (unless you use some weird network storage) are the outputs, not the file reading.
Given you are using mesos, I guess you're actually having 10*number of services files. In this case it might be better to reduce total number of files to be forwarded.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.