Hello,
I'm trying to process events from logstash and I'm facing issue of slow processing of events.There are around 100k records.In logstash.yml I've enabled log.level debug.
So far I can observe in 2 hours around 11000 records were processed.I want the event to be processed faster.
I'm testing it in test instance with following config:
JVM Heap :2 gb
cpu:4 core
LOGSTASH 7.9.1
logstash configfile:
- pipeline.id: tcgeometrytransfer
**queue.type: persisted**
path.config: "/l/custom/TCS/logstash/logstash-7.9.1/scripts/tcgeometry/tc_geometry.cfg"
configfile:
input {
exec {
command => '/l/custom/TCS/logstash/logstash-7.9.1/scripts/tcsgeometry/run_tcs_geometry.sh'
schedule => "0 57 05 * * *"
}
}
filter {
if [message] =~ "^\{.*\}[\s\S]*$" {
json {
source => "message"
target => "parsed_json"
remove_field => "message"
}
split {
field => "[parsed_json][geoMonitorResponse]"
target => "geometry"
remove_field => [ "parsed_json" ]
}
if [geometry][graph][monitorDate] {
mutate {
convert => { "[geometry][graph][monitorDate]" => "string" }
}
date {
match => ["[geometry][graph][monitorDate]", "yyyy-MM-dd'T'HH:mm:ssZ"]
timezone => "UTC"
target => "@timestamp"
}
}
if [geometry][position][monitorDate] {
mutate {
convert => { "[geometry][position][monitorDate]" => "string" }
}
date {
match => ["[geometry][position][monitorDate]", "yyyy-MM-dd'T'HH:mm:ssZ"]
timezone => "UTC"
target => "@timestamp"
}
}
if [geometry][line][monitorDate] {
mutate {
convert => { "[geometry][line][monitorDate]" => "string" }
}
date {
match => ["[geometry][line][monitorDate]", "yyyy-MM-dd'T'HH:mm:ssZ"]
timezone => "UTC"
target => "@timestamp"
}
}
}
else {
drop { }
}
}
output {
elasticsearch {
hosts => "http://abc:9200"
ilm_pattern => "{now/d}-000001"
ilm_rollover_alias => "cis-monitor-geometry"
ilm_policy => "tcs-monitor-geometry-policy"
doc_as_upsert => true
document_id => "%{[geometry][uniqueId]}"
}
}
Also would be helpful if someone could suggest best config setup for below attributes:
currently using defaults,Not sure what could be best batch size and delay to be setup so log processing is fast.
pipeline.batch.size: 125
pipeline.batch.delay: 50
Some questions:
1)If batch size is increased to 1000 events then what should be the batch delay? , not sure how this works.Intention to increase so that events are processed faster.
What other requirements would be needed,like does it require more core or jvm heap size to be increased?
2)For other usecase where data is around 17000 its process all data in 30 min.What is the reason that all processed data of 17000 comes in index together i.e all data is inserted at a time in index and not incremently.
I would like to understand if batch size and delay is not given then its process with default size and delay. what might be the reason that in "index" data is not coming according to batches and if comes then come whole.Really a confusion.
3)Does batch_size and batch delay dosent work with persisted queue?
would be interested to know how logstash could be best configured to process events faster and with best optimization.
Thanks