That is, are you counting up identical messages in your logging library or in Logstash, then emitting a single message saying
(original message) ...
This message was repeated 987 times.
When you try to visualize log rate in Kibana by sheer count of messages, your visualization is now incorrect.
I’ve been thinking about how to regain accuracy, insight into the actual message rate the source is generating. We should be able to see how much an application is actually trying to output.
So I thought it might be useful to add a field to indicate if a message is an aggregation.
Maybe something like “repeat_count”.
An ingest pipeline could assure it’s set, making it default to 1. Then rather than visualize by sheer count of messages you can visualize by sum of “repeat_count”.
Q1. Are there any existing solutions? Does ECS already have a field that I missed?
Q2. Do you agree about the problem?
Q3. Do you see this is a reasonable solution?
Brilliant! I love that this feature's in ES. Thanks for the pointer, Stephen.
Now I need a way to convey the _doc_count data from the upstream aggregators so that it turns into the _doc_count field once it gets into ES. I haven't seen a field in ECS for this, but let me know if you have.
If anyone has any ideas about how to represent this in ECS, I'd be glad to hear.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.