I'm trying to process events from logstash and I'm facing issue of slow processing of events.There are around 100k records.In logstash.yml I've enabled log.level debug.
So far I can observe in 2 hours around 11000 records were processed.I want the event to be processed faster.
I'm testing it in test instance with following config:
JVM Heap :2 gb
Also would be helpful if someone could suggest best config setup for below attributes:
currently using defaults,Not sure what could be best batch size and delay to be setup so log processing is fast.
1)If batch size is increased to 1000 events then what should be the batch delay? , not sure how this works.Intention to increase so that events are processed faster.
What other requirements would be needed,like does it require more core or jvm heap size to be increased?
2)For other usecase where data is around 17000 its process all data in 30 min.What is the reason that all processed data of 17000 comes in index together i.e all data is inserted at a time in index and not incremently.
I would like to understand if batch size and delay is not given then its process with default size and delay. what might be the reason that in "index" data is not coming according to batches and if comes then come whole.Really a confusion.
3)Does batch_size and batch delay dosent work with persisted queue?
would be interested to know how logstash could be best configured to process events faster and with best optimization.
It might be tc_geometry.cfg is slow, 2 hours around 11000 records is too much for LS, that should be processed inside filter/pipeline in a minute or less. Do you have some logging on .sh side? If you don't have add. Just start, end time per lines, something basic.
At the glance, the json plugin should consume the most time. Others are basic activities.
Increasing memory is useful if you have a large amount data like XML structure with 10-50 000 nodes.
Thanx for yout time to look into this and pointing out the stats plugin.I will add the id and check with logger as u mentioned.The filter plugin seems working fine,and current challenge is in pipeline I've mentioned batch size:1000 and batch delay 200.It dosent seems processing 1000 events .I'm using queue.type:persisted and when changed queue.type:memory then also it dosent pick 1000 events.
As recommended by you will add id and check with stats plugin to check execution time.Its still pending to test.Will keep posted latest outcome
You may have looked into this, curious. Have you verified that your batch size (1000) arrives within your batch delay (per worker thread)? These settings are per pipeline worker thread. What is your pipeline.workers set to?
My speculation is that events may not be generated(source side, input) fast enough (within batch delay) to fill up 1000 events (batch size) per thread. Therefore processing (filter) what ever is available within the batch queue
The same opinion share with Sunile. Without logger inside run_tcs_geometry.sh, it's hard to say.
In general, increasing memory will help only in case of large data. Pipeline reconfiguration is useful when you processing enormous number of messages.
@PRASHANT_MEHTA have you checked the pipeline static? Can you post JSON with customized IDs?