Logstash Version: 2.4
Elasticsearch Version: 2.3.4
Redis version: 2.19
I have an indexing setup where Logstash reads from Redis, parses the data, and indexes into Elasticsearch (fairly standard). I monitor the list size of Redis as a primary health check.
Starting about 2 days ago, Logstash will seemingly randomly stop pulling data from Redis and inserting it into Elasticsearch.
In the past, I had the issue where Elasticsearch did not have enough memory to handle all of the data I was feeding it, and this resulted in a similar behavior by Logstash, but in that case, I was getting decreased throughput whereas now I'm getting 0 throughput. I did however check the data/memory ratio and the JVM heap behaviors for all of my nodes in ES, and they are all healthy sawtooths - garbage collecting whenever they hit approx. 70%. In addition to this, since I have multiple Logstash indexers (on different servers) feeding Elasticsearch, they won't all exhibit this 0 throughput behavior at the same time. So it doesn't seem to be Elasticsearch as the problem this time.
Checking the perf stats of Logstash doesn't seem to reveal much (to me at least).
Top will show the offending Logstash processes between 0.0% - 0.3% cpu usage.
Running the command:
redis-cli lrange logstash 0 0
Shows that the leading message to be read from Redis is not being removed, where a healthy Logstash will result in that same command showing a different Redis item each time (because Logstash is constantly removing messages).
Memory: Logstash is set to have a default heap of 1G, and heap usage for offending Logstash instances (while having 0% cpu) is not much different from ones that are working normally. I usually see between 600-700m of memory being used with no real growth - to me that means that memory isn't a constraint here.
Logs: In default logging mode, there are no ERROR logs being generated when this happens. When I turned on Debug, I couldn't see anything other than the flood of messages that are generated when Logstash is working properly.
I prior to this, Logstash worked near perfectly, and no changes have been made to the indexer in the past 2 days.
Any help would be great. I'm not sure where to look anymore. The only workaround I have is to restart Logstash when it exhibits this behavior, and about 70% of the time the restarted Logstash will begin reading messages again, only to stop sometime in the near future. (This morning we restarted a Logstash, and 2 hours later, it stopped reading.)