So far I have ~300 hosts forwarding their log4j and syslogs to Logstash, along with a handful of other log types. The log4j filters are kind of ugly and I think may be related to my problem. It started happening when I added Redis inline, right now the flow is Logstash Forwarder > Logstash > Redis > Logstash with two boxes running Logstash & Redis (because sending logs directly to logstash with out redis was crashing the Logstash Forwarder input after a time).
It's almost like break_on_match is not set - but there are times when it's even seeing the same log line multiple times, not just the field. So for timestamp (not @timestamp) XXX I will have multiple entries where (for instance)
timestamp = [XXX, XXX, XXX]
log_level = [ERROR, ERROR, ERROR]
and then
timestamp = [XXX, XXX, XXX, XXX]
log_level = [ERROR, ERROR, ERROR, ERROR]
even though the message is the same for both entries
Example of results
Logstash configs
Logstash forwarder config
Sorry, the logstash config is really ugly, I need to break out the grok filters but in this case they may be pertinent.