Looking for a little advice in trying to narrow down the source of a problem we have with our ELK instance.
Intermittently, we're seeing the received, and emitted EPS on an LS node drop to zero and we can't figure out why. The image below demonstrates what happens.
We've pushed the 6.7.1 code for LS out (along with ES, and K on other nodes), but this hasn't fixed the issue. Other things we've tried are
- Increase Heap
- Disable some filter plugin use on conf files (jdbc_streaming, drop, geoip, cidr).
- Increase/decrease pipeline.batch.size from 125-250-500-1000, as the system is able to cope with it.
- Performed bin/logstash-plugin update.
None of the above have fixed the issue, and I'm unsure on where to look next. The heap graph looks 'wrong' to me, when the issue occurs, but I don't know where to look to investigate that.
A tcpdump on the ingress interface shows that all inputs port are receiving traffic (we use a mix of beats, syslog, and json).
Any ideas, or suggestions?