Hello,
I use Logstash 6.4 and Filebeat 6.4 to collect large log file(just a test), not the whole file is large, but that each line is large, like 10KB a line.
When a line of logs is only 1KB in size, the test results are normal. But when i use 10KB per line of log files, it is found that Logstash will process 2048 rows of log data each time (of course, this is the default configuration of Filebeat). The output plugin will be processed in a very short time, but every two times, it will be between 20 seconds and 30 seconds. I changed the log level of Logstash to DEBUG, and found that many of logs like this were printed in 20 seconds:
[2019-01-21T08:58:52,131][DEBUG][logstash.pipeline ] "MY_TEST_LOG_CONTENT", "offset"=>82839449, "host"=>{"name"=>"localhost.localdomain"}, "@version"=>"1", "beat"=>{"name"=>"localhost.localdomain", "version"=>"6.4.0", "hostname"=>"localhost.localdomain"}, "tags"=>["beats_input_codec_plain_applied"], "source"=>"/home/logstash/app/testlog_10k/test1.log", "input"=>{"type"=>"log"}}}
After that, there will be a log like Pushing flush onto pipeline
. I think that within 20 seconds of each interval, Logstash's beats input plugin is processing the received log data, taking 10KB of logs per line as an example, 2048 rows of data is just 20MB, why the beats plugin handling this too slow?
I used Wireshark to view the network transfer between Filebeat server and Logstash server, and found that 20MB of data was transferred in 1 second to 2 seconds. Similarly, i used the Logstash's file input plugin to test it, using the same standard log file, Logstash processes the data very quickly.
So I think that the beats input plugin will be very slow when dealing with large log files. What can I do to avoid this? Hoping to get your help
Here is the environment I use:
Server: RHEL 6.8 with CPU 8 cores
Version: Filebeat 6.4 and Logstash 6.4