Logstash starting damn slow

Hi,

I need some guidiance to speed up the starting of logstash - or at least I would like to know why it takes so long and if that is really ok.

[2018-03-27T07:18:48,795][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://139.1.117.41:19200/"}
[2018-03-27T07:18:48,804][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-03-27T07:18:48,804][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-03-27T07:18:48,810][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-03-27T07:18:48,811][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-03-27T07:18:48,815][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//139.1.117.41:19200"]}
[2018-03-27T07:18:48,939][INFO ][logstash.inputs.redis    ] Registering Redis {:identity=>"redis://@139.1.117.41:6379/0 list:tapdispatcher"}
[2018-03-27T07:18:49,574][INFO ][logstash.pipeline        ] Pipeline started succesfully {:pipeline_id=>"tapdispatcher", :thread=>"#<Thread:0x7f00ea0e run>"}
[2018-03-27T07:19:18,429][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"tmadmin_usercount", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-03-27T07:19:18,466][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://139.1.117.41:19200/]}}
[2018-03-27T07:19:18,467][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://139.1.117.41:19200/, :path=>"/"}
[2018-03-27T07:19:18,472][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://139.1.117.41:19200/"}
[2018-03-27T07:19:18,755][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-03-27T07:19:18,756][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-03-27T07:19:19,015][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-03-27T07:19:19,017][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}

I have a redis instance as message broker. Each logfile type is mapped to a redis key (list).
Logstash has multiple pipelines (one per key).

It takes about 30 seconds to start up a pipeline. The interesting thing is that the time is lost beween the loglines started and starting, so a delay after pipeline n has finished starting and pipeline n+1 begins to start.

Running on Redhat 7 in docker. But I have the same issue when running it classic outside of docker.

no idea? :frowning:

Logstash starts slowly PERIOD! :frowning:
30 seconds is actually pretty quick. Some of our more complex pipelines can take 3-5 minutes.

NOTE: This is one of the reasons why it is wise to split Logstash into "collection" instances and "processing" instances, with some sort of queuing in between such as redis or kafka. This allows you to make changes to your processing instances and restart them, without missing any incoming data due to long restart times.

You don't mention which version of Logstash you are using. If it is prior to 5.5.0 then you can speed up start times by adding the following to your jvm.options file.

# Entropy source for randomness
-Djava.security.egd=file:/dev/urandom

That was the only trick in the past, and it is standard since 5.5.0.

currently I have 30 pipelines. A restart of whole logstash takes me about 15 - 30 minutes.
When I had all in one pipeline It also took a long time, but I think it was about 3-10 minutes.

So on restarting additional pipelines have that much overhad? :frowning:

using version 6.2.3

A few questions...

  1. Which OS?
  2. How much heap have you given Logstash?
  3. How much logic is duplicated across pipelines?

Redhat 7.

-Xms2g
-Xmx2g

Not really much. We are currently parsing 30 completely different logs which are produced by our application.

What is equal / similar between the pipeline?

  • input is quite similar. Same redis, instance, but each logfile type is currently stored in a different key.
  • output is quite similar. Points to elasticsearch, but depending on the logfile type it chooses different indices.
  • one part that is identical in all pipelines: 1st filter after input gets a time tick, last filter before output gets a time tick and calculates the time the event spends in logstash - used for debugging and performance analysis.
  • the complexity parsing of a log type differs. Some have quite complex ruby stuff, others are only some grokking and timestamp conversion.

We seem to be running very similar setups. I am on CentOS 7. I actually had over 40 pipelines running at one point, a dozen of which would be considered very complex. I also use redis for "intra-pipeline communications". Here are some ways to improve performance.

  1. You really need to give Logstash more heap. 6-8GB would be a starting point. This alone will probably improve your start times dramatically.

  2. Use the pipeline.workers setting for each pipeline to reduce the number of threads. If you have not set this, the default is 1 worker for each CPU core. If your machine has a lot of cores, you might literally have 500-700 jvm threads or more. On a 4/8 core CPU I had nearly 500 threads at one point. After tuning, this dropped to around 250 (probably still a lot) and Logstash runs better. Things like a simple UDP input collecting raw syslog and dumping it into redis doesn't need more than 2 workers. Things like the netflow codec will use a lot of CPU and benefit from more workers. Simple inputs and outputs need less resources. Reducing the number of workers, eliminates all of the JVM overhead and context switching those threads cause.

  3. Our solution actually breaks common logic out into its own pipeline. We use values within our data model to route messages to the pipelines that the message requires. For example, all messages that represent a network connection... sFlow, Netflow, IPFIX, Fortinet, Palo Alto, Cisco ASA, NGiNX access logs and more... are first processed by their own source-specific pipelines. They are then all processed by a common "connection post-processor" before being sent to Elasticsearch. Running this very complex pipeline only once instead of embedding it in each individual source-specific pipeline GREATLY reduces resource requirements.

  4. Not really related to your performance issue, but If you are using redis between "collection" and "processing" pipelines, I would recommend running the collection pipelines in a separate Logstash instance. This will allow you to restart the processing instance, e.g. when pipeline configs are significantly changed, without missing collection of the raw data.

From my experience I really think that #1 is your primarily problem. #2 will also help. And if you have a lot of "common logic" between pipelines, using methods like #3 could be of benefit.

2 Likes

thanks, I will start testing with different heap sizes.
I also thought about #2 and will test it shortly.

By the way, we have only filebeat as shipper. We use redis output there to push data into redis directly. So currently the seems to be no need for a collector logstash instance.

Good idea to run through more than one pipeline if we have complex redundancy. I will keep it in mind, but I think its not needed yet.

@asp I would love to hear how tweaking heap size and workers has helped. Please share here if you don't mind.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.