Logstash slow processing of events

PRASHANT_MEHTA · February 1, 2023, 7:36am

Hello,

I'm trying to process events from logstash and I'm facing issue of slow processing of events.There are around 100k records.In logstash.yml I've enabled log.level debug.
So far I can observe in 2 hours around 11000 records were processed.I want the event to be processed faster.
I'm testing it in test instance with following config:
JVM Heap :2 gb
cpu:4 core

LOGSTASH 7.9.1
logstash configfile:

- pipeline.id: tcgeometrytransfer
   **queue.type: persisted**
   path.config: "/l/custom/TCS/logstash/logstash-7.9.1/scripts/tcgeometry/tc_geometry.cfg"

configfile:

input {
   exec {
      command => '/l/custom/TCS/logstash/logstash-7.9.1/scripts/tcsgeometry/run_tcs_geometry.sh'
      schedule => "0 57 05 * * *"
   }
}
filter {
   if [message] =~ "^\{.*\}[\s\S]*$" {
      json {
         source => "message"
         target => "parsed_json"
		 remove_field => "message"
      }
	  
      split {
         field => "[parsed_json][geoMonitorResponse]"
         target => "geometry"
         remove_field => [ "parsed_json" ]
      }	
      
      if [geometry][graph][monitorDate] {
	         mutate {
				convert => { "[geometry][graph][monitorDate]" => "string" }
			}
			date {
              match => ["[geometry][graph][monitorDate]", "yyyy-MM-dd'T'HH:mm:ssZ"]
              timezone => "UTC"
              target => "@timestamp"
              
			}
		}
   
     if [geometry][position][monitorDate] {
	         mutate {
				convert => { "[geometry][position][monitorDate]" => "string" }
			}
			date {
              match => ["[geometry][position][monitorDate]", "yyyy-MM-dd'T'HH:mm:ssZ"]
              timezone => "UTC"
              target => "@timestamp"
              
			}
		}
   
    if [geometry][line][monitorDate] {
	         mutate {
				convert => { "[geometry][line][monitorDate]" => "string" }
			}
			date {
              match => ["[geometry][line][monitorDate]", "yyyy-MM-dd'T'HH:mm:ssZ"]
              timezone => "UTC"
              target => "@timestamp"
              
			}
		}
      
   }
   else {
     drop { }
   }
}
output {
   elasticsearch {
      hosts => "http://abc:9200"
	  ilm_pattern => "{now/d}-000001"
      ilm_rollover_alias => "cis-monitor-geometry"
	  ilm_policy => "tcs-monitor-geometry-policy"
	  doc_as_upsert => true
	  document_id => "%{[geometry][uniqueId]}" 
   } 
}

Also would be helpful if someone could suggest best config setup for below attributes:
currently using defaults,Not sure what could be best batch size and delay to be setup so log processing is fast.

pipeline.batch.size: 125
pipeline.batch.delay: 50

Some questions:
1)If batch size is increased to 1000 events then what should be the batch delay? , not sure how this works.Intention to increase so that events are processed faster.
What other requirements would be needed,like does it require more core or jvm heap size to be increased?

2)For other usecase where data is around 17000 its process all data in 30 min.What is the reason that all processed data of 17000 comes in index together i.e all data is inserted at a time in index and not incremently.
I would like to understand if batch size and delay is not given then its process with default size and delay. what might be the reason that in "index" data is not coming according to batches and if comes then come whole.Really a confusion.

3)Does batch_size and batch delay dosent work with persisted queue?

would be interested to know how logstash could be best configured to process events faster and with best optimization.

Thanks

PRASHANT_MEHTA · February 3, 2023, 6:39am

Hello @yaauie ,

Can you plz guide or suggest something on this,how could this be resolved for fast processing of events?

Thanks

Sunile_Manjee · February 3, 2023, 2:18pm

Does it require more heap by increasing batch size, yes. Here are the docs on both arguments which may help you determine the values based your use case: Tuning and Profiling Logstash Performance | Logstash Reference [8.6] | Elastic

Rios · February 3, 2023, 9:20pm

Please use http://localhost:9600/_node/stats/pipelines?pretty on LS to show you execution time per plugin. It's useful to add IDs per plugin.

input {
   exec {
      command => '/l/custom/TCS/logstash/logstash-7.9.1/scripts/tcsgeometry/run_tcs_geometry.sh'
      schedule => "0 57 05 * * *"
      id => "exec"
   }
}

It might be tc_geometry.cfg is slow, 2 hours around 11000 records is too much for LS, that should be processed inside filter/pipeline in a minute or less. Do you have some logging on .sh side? If you don't have add. Just start, end time per lines, something basic.

At the glance, the json plugin should consume the most time. Others are basic activities.
Increasing memory is useful if you have a large amount data like XML structure with 10-50 000 nodes.

PRASHANT_MEHTA · February 6, 2023, 5:20am

Hello @Sunile_Manjee ,

Thanx for looking into this and pointing out the links.Will go through the documents and
check what best can be done to rectify the slow processing.

Thanx

PRASHANT_MEHTA · February 6, 2023, 5:27am

Hello @Rios ,

Thanx for yout time to look into this and pointing out the stats plugin.I will add the id and check with logger as u mentioned.The filter plugin seems working fine,and current challenge is in pipeline I've mentioned batch size:1000 and batch delay 200.It dosent seems processing 1000 events .I'm using queue.type:persisted and when changed queue.type:memory then also it dosent pick 1000 events.
As recommended by you will add id and check with stats plugin to check execution time.Its still pending to test.Will keep posted latest outcome

Many Thanx

Sunile_Manjee · February 6, 2023, 5:43am

You may have looked into this, curious. Have you verified that your batch size (1000) arrives within your batch delay (per worker thread)? These settings are per pipeline worker thread. What is your pipeline.workers set to?

PRASHANT_MEHTA · February 6, 2023, 5:58am

Hello,

Below is the pipeline config:

- pipeline.id: tcsgeometrytransfer
  queue.type: persisted
  path.config: "/l/custom/TCS/logstash/logstash7.9.1/scripts/tcsgeometry/tcs_geometry.cfg"
  pipeline.batch.size: 1000
  pipeline.batch.delay: 200

Sunile_Manjee · February 6, 2023, 6:15am

pipeline_workers is not set in your config, therefore it defaults to number of cores on your host.

PRASHANT_MEHTA · February 6, 2023, 6:19am

Yes,I'm having 4 core cpu for this test instance, I suppose then pipeline_workers would be taken up as 4 by default then,but still its should process those 1000 events as it takes deafult workers?

Sunile_Manjee · February 6, 2023, 1:53pm

My speculation is that events may not be generated(source side, input) fast enough (within batch delay) to fill up 1000 events (batch size) per thread. Therefore processing (filter) what ever is available within the batch queue

Rios · February 6, 2023, 5:02pm

The same opinion share with Sunile. Without logger inside run_tcs_geometry.sh, it's hard to say.
In general, increasing memory will help only in case of large data. Pipeline reconfiguration is useful when you processing enormous number of messages.

@PRASHANT_MEHTA have you checked the pipeline static? Can you post JSON with customized IDs?

system · March 6, 2023, 5:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash Event Processing is decreasing as times go by Logstash	2	318	July 3, 2018
Logstash slow processing Logstash	2	547	April 28, 2017
Logstash parse too slow to elasticsearch Logstash	9	2269	March 2, 2018
Filter:drop performance varies by 200x Logstash	1	618	February 8, 2018
LS 5.4.3 w/ lumberjack input plugin - very slow processing of events Logstash	8	944	August 2, 2017

Logstash slow processing of events

Related topics