Events being stuck in a mutade

I notice that sometime my logstash (5.6, i can't upgrade yet to 6.x) performance start to drop and if i let it continue, it will start to get a very erratic behavior... i suspect that the java garbage collector is always kicking in, consuming most of the logstash time.

if i do a logstash API call, to show the pipelines, i can see that a mutate have more events coming in than coming out, so i Id all the mudates and created this small script to show the difference between input and output values:

$ while true; do curl -XGET 'localhost:9600/_node/stats/pipeline?pretty' -s | jq . | awk 'BEGIN {FS="{|}|: |\," } /"in":/ { i=$2 ; next }; /"out":/ { o=$2 ; next } /"id":/ { id=$2 }; /name/ { print $0, i-o , id }  ' | column -t | grep -v ' 0 ' ; sleep 2; echo ; done 

(...)
"name": "kafka" 25713782 "13d304fd6a045897056d260305bd8e188d2cac62-3"
"name": "kafka" 23625720 "13d304fd6a045897056d260305bd8e188d2cac62-4"
"name": "beats" 27482720 "13d304fd6a045897056d260305bd8e188d2cac62-1"
"name": "kafka" 23210061 "13d304fd6a045897056d260305bd8e188d2cac62-5"
"name": "tcp" 27482720 "13d304fd6a045897056d260305bd8e188d2cac62-2"
"name": "heartbeat" 27482464 "13d304fd6a045897056d260305bd8e188d2cac62-6"
"name": "beats", 9898853 "13d304fd6a045897056d260305bd8e188d2cac62-7"
"name": "drop" 158 "13d304fd6a045897056d260305bd8e188d2cac62-35"
"name": "drop" 4810 "13d304fd6a045897056d260305bd8e188d2cac62-103"
"name": "clone" -10504 "13d304fd6a045897056d260305bd8e188d2cac62-96"
"name": "drop" 58014 "13d304fd6a045897056d260305bd8e188d2cac62-34"
"name": "mutate" 4810 "98-others.conf-4"
"name": "metrics" 56994 "13d304fd6a045897056d260305bd8e188d2cac62-36"
"name": "elasticsearch" 1998 "global"
(...)

All Inputs, like kafka, beats,metric, heartbeat and tcp will have only input values, so that is normal to have high values...
Drop will of course lose the event, so no output and also a high value...
Clone will in turn output 2 events for one input, so negative value is also ok.
Elasticsearch output will queue some events and flush, so that one is always changing....
But then we have the mutate (98-others.conf file, 4º mutate in file) that is always increasing the number of events, showing that some events enter the mutate, but never leave it!

The mutate itself is very simple:

mutate {
  id => "98-others.conf-4"
  add_field => { "logstash" => "server-a03" }
}

It is just adding in what logstash machine the event was processed, so i can later make performance graphics per machine or debug any logstash problem.

I have no idea why it is locking up. If i remove this mutate, the next mutate will then start to lock up events, so it doesn't look like this mutate is really the cause, only the trigger. both mutate are global, apply to all events, as i have many other mutates inside of if tests earlier

I'm suspecting that some event is being dropped earlier, but in some performance optimization, the mutate is already being applied to that dropped events and the mutate will end with some leak from losing the event while applying the change

So is this a bug, a known issue and anyone knows any workaround this?

Thanks for the help
Daniel

EDIT:
i just notice that the number of events in one of the drop is exactly the same as the ones in the mutate, so this further links the two, as i suspected

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.