It's not easy to get details from your posts. To summarize:
-
23 pipelines + 3 under heavy loads on 16 cores&32 GB memory
-
~10ms is event processing time for heavy loads
-
everything is on 1 LS host
-
pipeline.workers and pipeline.batch.size are default?
pipeline.workers # LS will use max 16
pipeline.batch.size: 125
pipeline.batch.delay: 5
pipeline.ordered: auto
- What are your values Xms and Xmx in jvm.options?
- Are all settings the same for pipelines? What is specific for those 3 pipelines?
- Are you using memory or persistent queue?
- How many data ES nodes are using?
- Have you checked ES logs? Especially because slow insert and DLQ.
- What is avg/max the message in those h loaded pipelines?
Since you said "my configuration was simple", there is not much code in filters, you can try with:
Edit: make backup before any changes
pipeline.batch.size: 250 # only in heavy loads
compression_level => 5 # or increase to reduce load
ssl_enabled => true # yes, should be by default
pool_max => 2000 # increase to reduce reopening
pool_max_per_route => 200
ssl_supported_protocols => "TLSv1.3" # use only 1.3, should be faster to establish sec channel
resurrect_delay => 2
- Exclude dedicated master nodes from list
- Check ES logs(all nodes), why are you getting dlq_routed, you can do it manually or metricbeat or agent
- Use sniffing mode, check this thread.
- Investigate LS statistics for all pipelines
- Check value for
tcp_keepalive_time
, only check, do not touch on OS level. - Allocate 2-3 nodes only to heavy loaded pipelines, other pipelines should
This is not simple optimization activity since it's on live data&load, where Jedi council don't have full information or access. I truly hope other Jedi will give own opinion.
Have you used live pipelines monitoring in Kibana? If not already exists, should set it
PUT _cluster/settings
{
"persistent": {
"xpack.monitoring.collection.enabled": true
}
}