Logstash stops processing after some time


(Meet Dave) #1

Hi,

In our stack, we have 12 logstash nodes. They work very fine when we restart all of them. After some time some nodes stop accepting data and some keeps going and processing. This way all of the nodes are getting down one by one. We were able to find these logs in logstash-stderr.log file Logstash-stderr.log when last event got processed after that logstash service was running, it kept sending monitoring data to elastic but ingestion was stopped.

What factors should we look into this? Is this something because of some filter? we are not seeing anything in stdout logs. This started happening suddenly until then everything was working fine. Earlier ingestion rate was nearly 20k/s but now it starts from 10k/s gradually decreasing and then stops. The only workaround we found was to restart nodes.

Thanks


(Meet Dave) #2

After enabling debug logs, we found too many batches pending: true lines in logs. I think there is something which is keeping pipeline blocked (https://github.com/logstash-plugins/logstash-input-beats/issues/310) and that pipeline gets blocked at the same time when this error gets logged into the file. Logstash-stderr.log.

Does this happen when we have large payload? or there is some filter causing this issue?

Thanks


(Christian Dahlqvist) #3

Are you sending data to Elasticsearch? If so, are you having any problems with Elasticsearch? Is there anything in the logs? Is there any problem with your data that causes an error when indexing into Elasticsearch? Have you tried enabling debug logging on Logstash?


(Meet Dave) #4

@Christian_Dahlqvist Thank you for your reply. I was able to reproduce this issue by using following logstash filter

input {
    beats {
        port => 5044
    }
}
filter {
    dissect {
        mapping => {
            "message" => "%{@timestamp} %{msg}"
        }
   }
}

output {
   stdout {
       codec => rubydebug { metadata => true }
   }
   elasticsearch {
       hosts => ["localhost:9200"]
       index => "%{[index]}_%{+YYYY.MM.dd}"
   }
} 

dissect filter assigns string in @timestamp field and when event contains string type in @timestamp field, the output cannot resolve index name as there is {+YYY.MM.dd} and I assume it expects date instead of string and throws above error. Strange though, this wasn't happening in 5.6 or maybe log pattern changed on our side i don't know and caused this type of error. When logstash throws this error it stops all workers so no event goes into processing and it puts back pressure on inputs.

Instead of specifying @timestamp directly to dissect filter we should use date filter to parse dates.

Shouldn't there be a final check to parse @timestamp field in elasticsearch output while parsing index so that any misconfigured filter won't cause this issue?


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.