We use filebeat to tail log files on disk and ship them to kafka. We observed there is some significant delay between the logs are written to disk and they were processed and sent to kafka. For example:
for READER-6.log, there is nearly 45 seconds delay between C++ log and filebeat processes the log and sends them to kafka.
for READER-24/34.log, the delay is ~38 minutes.
I understand that different log files are processed and sent to kafka independently, i am still surprised to see such big discrepancy between different files. I am wondering is this as expected? and How to reduce this discrepancy among different files such that each file is processed at the same pace.
Left side of "|" is filebeat timestamp, and Right side of "|" is C++ log timestamp.
2018-08-20T23:00:57.680Z|I0820 23:00:12.257315 READER-6.log
2018-08-20T23:00:57.680Z|I0820 23:00:12.258847 READER-6.log
2018-08-20T23:00:57.680Z|I0820 23:00:12.266106 READER-6.log
2018-08-20T23:00:57.680Z|I0820 22:23:20.357524 READER-34.log
2018-08-20T23:00:57.680Z|I0820 22:23:20.357533 READER-34.log
018-08-20T23:00:57.680Z|I0820 22:23:20.357797 READER-34.log
2018-08-20T23:00:57.680Z|I0820 22:23:25.410398 READER-24.log
2018-08-20T23:00:57.680Z|I0820 22:23:25.411065 READER-24.log
2018-08-20T23:00:57.680Z|I0820 22:23:25.414383 READER-24.log
btw, in case it is relevant, we limit the CPU core used by filebeat to 1, and nice it.
I would like to know are we too aggressive in limiting CPU cores used for filebeat to be 1, and nice it?
between limiting CPU codes and nice, which one would cause more problems? Is there an easy way for me to debug how many cores needed to keep up with logs generation on disk so that they can be shipped out timely, ideally within 1 min if not in a few seconds.
The amount of resources Filebeat requires will depend on the amount ion files monitored and the volume of data collected. Have you in the past experienced issues with Filebeat resource usage as you have limited it so severely? As it is set to nice it will probably have difficulty keeping up whenever the host gets busy, especially if this correlates with larger volumes of logs being generated, but that may be the intention.
@Christian_Dahlqvist right, we understand lagging behind depends on the log volume, this happened when we started ramping up traffic (therefore, increased log volume) recently.
I apologize if my last post were unclear. I would like to understand the following:
(1) besides trial and error (gradually increase number of cores given to filebeat and observe ) , is there an easy way to determine how many cores that filebeat need to keep up the current traffic?
(2) Why there is huge discrepancy between delays of shipping out different log files? I assume that each log file is handled by a separate go thread. who is responsible for schedule different go-threads? I am concerned about unfairness in GO threads scheduler.
What configuration file? you meant filebeat.yaml file?
As I posted above: it is 45 seconds delay for one file, and 38 minutes for another two files. These log files are generated in a similar way.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.