A large amount of Logstash inputs

ChiBiPonD · July 8, 2015, 3:35pm

I would like to achieve 30k inputs per machine to ElasticSearch. I have set up 5 ElasticSearch master nodes. As I want to get inputs from all 20 machines at 600K rate, I need to get 30k inputs per machine. Any suggestions? The config that I tried is shown as follows.

 input {
     exec {
         command => "echo hello1 $(date +'%d/%m/%Y %H:%M:%S:%3N') &"
         interval => 30
         type => "loadavg1"
     }
     exec {
         command => "echo hello2 $(date +'%d/%m/%Y %H:%M:%S:%3N') &"
         interval => 30
         type => "loadavg2"
     }
    ...
    exec {
         command => "echo hello30000 $(date +'%d/%m/%Y %H:%M:%S:%3N') &"
         interval => 30
         type => "loadavg30000"
     }
    }
output {
  elasticsearch {
    host => "test01"
    cluster  => ccm_elasticsearch # this matches out elasticsearch cluster.name
    protocol => http
  }

Thank you in advance

magnusbaeck · July 8, 2015, 6:02pm

If you want to test the ability of generating and processing 30k events/s, having 30k exec inputs that potentially could attempt to fork off a process at the same is a really bad idea. Well, it's a bad idea regardless. How will the events be generated when you actually deploy this?

(Besides, 30k events every 30 seconds is just 1 event/second so I'm not sure what you're even trying to do here.)

ChiBiPonD · July 8, 2015, 7:07pm

I plan to have a process or 50 processes writing the value to files in the real deployment. And then configure the Logstash to read from log files (30,000 values per machines). I am trying to find the best way to push the values at the rate of 600k. I agree with you that 30k Exec is the bad idea. However, I don't know right now what is the best way to push the 600,000 values per second from all 20 machines or 30k events per second. Please suggest me the solutions.

magnusbaeck · July 8, 2015, 7:13pm

How about modelling what you're actually trying to do, i.e. having Logstash read 50 files (probably best with more than one file input to improve parallelism) to which other processes (imitating your real event sources) are writing data?

ChiBiPonD · July 8, 2015, 8:07pm

What I am actually trying now are

A process pushes 1 value at a time until it meets 30k values (right now the process just generates the number from for loop) to one log file.
50 processes pushes values (1 process has 1 log file) to their own log files. In the Logstash config file has file inputs e.g. output1, output2, ..., output50.
These 2 ways are what i'm testing now but it still uses more than 1 second. To be more specific, it is only 90k values per second. Maybe my processes are the cause of reducing rate. So i tried to use Exec inputs, which is the bad idea. I don't know how to push values more faster to increase the rate. 30k values should be pushed at the same time that what i just think of right now. Thank you Mr. magnusbaeck for helping me.

magnusbaeck · July 8, 2015, 8:17pm

Are you saturating your CPUs, or what seems to be the limiting factor? I can totally imagine that a single Logstash instance might have problems pushing more than 90k events/s.

ChiBiPonD · July 8, 2015, 8:46pm

CPUs are saturated sometimes. Memory is still free. Is it possible to run mutiple Logstash instances? For example, run the first instance for the first 25 log files and run the second instance for the rest 25 log files.

warkolm · July 8, 2015, 10:42pm

You can use -w to tell LS to spawn more workers, but that will only help if you have CPU spare.

magnusbaeck · July 9, 2015, 3:46am

Is it possible to run mutiple Logstash instances? For example, run the first instance for the first 25 log files and run the second instance for the rest 25 log files.

Yeah, sure.

ChiBiPonD · July 9, 2015, 8:51am

Is this different from running Logstash 2 times?

warkolm · July 10, 2015, 1:01am

You mean two instances?

Yes and no. If you run two instances you get more throughput but at the cost of extra memory due to two JVMs running. Running more workers will be more efficient in that regards as they are all managed by the one JVM.

Topic		Replies	Views
Logstash input/ouput elasticsearch plugin capped performances Logstash	6	306	June 19, 2021
Maximising Logstash CPU Utilisation Logstash	2	434	March 11, 2019
How to improve performance of elasticsearch input in Logstash Logstash	4	1477	July 6, 2017
Recommendation for Elastic Search sizing for 45,000 Events per second Elasticsearch	6	936	June 3, 2019
Exporting Indexed Data Elasticsearch	6	11107	September 25, 2019

A large amount of Logstash inputs

Related topics