Increasing logstash throughput for the file input and file output


(Puneet Sharma) #1

Dear all,

How to increase logstash throughput for file input and file output plugins. I am using logstash on windows environment. My sample test produces 1.7GB of log file via stream writer, and I am moving these log messages via logstash to some other location in the same machine (and on same drive). To move 1.7 GB content, log stash takes more than 30 minutes or so. Is there any way I can improve this time? Below is my sample log stash configuration, and sample file to generate logs.

---------------------Log stash configuration----------------------------------
input {
file{
path => "D:/Log/*"
start_position => "beginning"
}
}

output {
file{
path => "D:\SampleLog.txt"
message_format => "%{message}"
flush_interval => 0
}
}
-------------------------Testing program to generate logs---------------------------------------------------------------
static void Main(string[] args)
{
using (StreamWriter writer = new StreamWriter(@"D:\Log\SampleLog.txt"))
{
for (int i = 0; i < 100000000; i++)
{
writer.WriteLine("Testing: " + i);
}
}
}

My machine has 16GB of RAM & intel i7 processor.

Thanks,
Puneet


(Mark Walkom) #2

How many workers are you giving to Logstash when starting it?


(Christian Dahlqvist) #3

Why are you flushing for every single message? What does performance look like with the default setting?


(Puneet Sharma) #4

@warkolm: I believe it's default. I haven't specified any worker parameters


(Mark Walkom) #5

When you start LS you can define workers, see the -w flag - https://www.elastic.co/guide/en/logstash/current/command-line-flags.html


(Christian Dahlqvist) #6

If there is no filter section in the configuration, I would not expect the number of filter workers to have an impact on performance.


(Puneet Sharma) #7

@Christian_Dahlqvist: I tried after removing flush_interval =>0 and still it takes more than 30 minutes to sync 1.8 GB file. And I agree that increasing worker threads suggested by @warkolm will not have any impact, as it impacts only filters worker threads.


(Christian Dahlqvist) #8

Using logstash for copying files seems like an unusual use case. What is the rationale behind this performance test?


(Puneet Sharma) #9

We have various clients machines which are generating logs in txt files. We want to move all these text logs to one centralized server where it be searched & provided it to the user. My test case is just to check how fast is the logstash syncing process with file input and output. The time which logstash takes to move the files is crucial as well.


(Mark Walkom) #10

Ahh yes, good point.


(Christian Dahlqvist) #11

How are the logs going to be searched once they have been moved to the central location? If you are planning on using Elasticsearch for this, there is generally no need to write logs into a central location before processing them and then indexing them directly into Elasticsearch.


(Puneet Sharma) #12

We are not planning to use elastic search for now. In our initial plan, logs will be moved in real time by logstash to centralized server and we provide some basic search utility to query the logs like query based on folder name.

So the scope of our initiative is very limited, we want to use logstash to move logs from different client machines to the centralized server.


(system) #13