Logstash 6.4 use beats input plugin to collect large log file, but too slow

Hello,

I use Logstash 6.4 and Filebeat 6.4 to collect large log file(just a test), not the whole file is large, but that each line is large, like 10KB a line.

When a line of logs is only 1KB in size, the test results are normal. But when i use 10KB per line of log files, it is found that Logstash will process 2048 rows of log data each time (of course, this is the default configuration of Filebeat). The output plugin will be processed in a very short time, but every two times, it will be between 20 seconds and 30 seconds. I changed the log level of Logstash to DEBUG, and found that many of logs like this were printed in 20 seconds:

[2019-01-21T08:58:52,131][DEBUG][logstash.pipeline ] "MY_TEST_LOG_CONTENT", "offset"=>82839449, "host"=>{"name"=>"localhost.localdomain"}, "@version"=>"1", "beat"=>{"name"=>"localhost.localdomain", "version"=>"6.4.0", "hostname"=>"localhost.localdomain"}, "tags"=>["beats_input_codec_plain_applied"], "source"=>"/home/logstash/app/testlog_10k/test1.log", "input"=>{"type"=>"log"}}}

After that, there will be a log like Pushing flush onto pipeline. I think that within 20 seconds of each interval, Logstash's beats input plugin is processing the received log data, taking 10KB of logs per line as an example, 2048 rows of data is just 20MB, why the beats plugin handling this too slow?

I used Wireshark to view the network transfer between Filebeat server and Logstash server, and found that 20MB of data was transferred in 1 second to 2 seconds. Similarly, i used the Logstash's file input plugin to test it, using the same standard log file, Logstash processes the data very quickly.

So I think that the beats input plugin will be very slow when dealing with large log files. What can I do to avoid this? Hoping to get your help :grinning:

Here is the environment I use:

Server: RHEL 6.8 with CPU 8 cores

Version: Filebeat 6.4 and Logstash 6.4

Hello everyone, I debugged the code and found the problem. The V2Batch class of the beats plugin has a problem with the allocate logic.

if (internalBuffer.writableBytes() < size + (2 * SIZE_OF_INT)){
    internalBuffer.capacity(internalBuffer.capacity() + size + (2 * SIZE_OF_INT));
}

This code will cause capacity method to be executed each time a message is received. This operation will take longer and longer.

An issue about this problem is open on github(Too many alloc/memcpy in V2Batch).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.