Events being stuck in a mutade

danielmotaleite · February 16, 2018, 6:16pm

I notice that sometime my logstash (5.6, i can't upgrade yet to 6.x) performance start to drop and if i let it continue, it will start to get a very erratic behavior... i suspect that the java garbage collector is always kicking in, consuming most of the logstash time.

if i do a logstash API call, to show the pipelines, i can see that a mutate have more events coming in than coming out, so i Id all the mudates and created this small script to show the difference between input and output values:

$ while true; do curl -XGET 'localhost:9600/_node/stats/pipeline?pretty' -s | jq . | awk 'BEGIN {FS="{|}|: |\," } /"in":/ { i=$2 ; next }; /"out":/ { o=$2 ; next } /"id":/ { id=$2 }; /name/ { print $0, i-o , id }  ' | column -t | grep -v ' 0 ' ; sleep 2; echo ; done

(...)
"name": "kafka" 25713782 "13d304fd6a045897056d260305bd8e188d2cac62-3"
"name": "kafka" 23625720 "13d304fd6a045897056d260305bd8e188d2cac62-4"
"name": "beats" 27482720 "13d304fd6a045897056d260305bd8e188d2cac62-1"
"name": "kafka" 23210061 "13d304fd6a045897056d260305bd8e188d2cac62-5"
"name": "tcp" 27482720 "13d304fd6a045897056d260305bd8e188d2cac62-2"
"name": "heartbeat" 27482464 "13d304fd6a045897056d260305bd8e188d2cac62-6"
"name": "beats", 9898853 "13d304fd6a045897056d260305bd8e188d2cac62-7"
"name": "drop" 158 "13d304fd6a045897056d260305bd8e188d2cac62-35"
"name": "drop" 4810 "13d304fd6a045897056d260305bd8e188d2cac62-103"
"name": "clone" -10504 "13d304fd6a045897056d260305bd8e188d2cac62-96"
"name": "drop" 58014 "13d304fd6a045897056d260305bd8e188d2cac62-34"
"name": "mutate" 4810 "98-others.conf-4"
"name": "metrics" 56994 "13d304fd6a045897056d260305bd8e188d2cac62-36"
"name": "elasticsearch" 1998 "global"
(...)

All Inputs, like kafka, beats,metric, heartbeat and tcp will have only input values, so that is normal to have high values...
Drop will of course lose the event, so no output and also a high value...
Clone will in turn output 2 events for one input, so negative value is also ok.
Elasticsearch output will queue some events and flush, so that one is always changing....
But then we have the mutate (98-others.conf file, 4º mutate in file) that is always increasing the number of events, showing that some events enter the mutate, but never leave it!

The mutate itself is very simple:

mutate {
  id => "98-others.conf-4"
  add_field => { "logstash" => "server-a03" }
}

It is just adding in what logstash machine the event was processed, so i can later make performance graphics per machine or debug any logstash problem.

I have no idea why it is locking up. If i remove this mutate, the next mutate will then start to lock up events, so it doesn't look like this mutate is really the cause, only the trigger. both mutate are global, apply to all events, as i have many other mutates inside of if tests earlier

I'm suspecting that some event is being dropped earlier, but in some performance optimization, the mutate is already being applied to that dropped events and the mutate will end with some leak from losing the event while applying the change

So is this a bug, a known issue and anyone knows any workaround this?

Thanks for the help
Daniel

EDIT:
i just notice that the number of events in one of the drop is exactly the same as the ones in the mutate, so this further links the two, as i suspected

system · March 16, 2018, 6:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash stops processing events Logstash	1	723	August 2, 2017
Debugging questions + advice request Logstash	4	347	March 19, 2018
Logstash 2.3.2 hung with no useful logging Logstash	4	1353	July 6, 2017
Logstash Kafka input loosing events Logstash	1	353	July 23, 2017
Logstash 5 stops event output after some time Logstash	4	1668	August 15, 2017

Events being stuck in a mutade

Related topics