ES 7.11 brought us the _doc_count
metadata field and the ability to properly compute aggregations over individual ES documents that represent multiple occurrences.
I would like to make use of this feature to preserve the original count of messages generated by logging facilities that "compress" / "aggregate" / "deduplicate" repeat messages. For example, when our application recognizes that the message it's trying to log has been repeated 100 times, it outputs a single copy of the message and just appends text like so:
Connection to server failed.
...
This message was repeated 100 times.
But since we're using structured logging, we could instead add the repeat count to the log document as a distinct piece of information, something like {"message_repeat_count":100}
. (I'm using this name to clarify the meaning, but a different name would be more appropriate.)
Once this document reaches Elasticsearch, we could use an ingest pipeline to set _doc_count
from the repeat count field and thus we would be able to "compress" our log data transmissions and preserve the ability to see true count in Elasticsearch/Kibana.
I wanted to ask the ECS / Elastic team (and customers) if they felt this was a useful idea and appropriate for ECS. (@andrewthad made this same suggestion 2 years ago.)
Assuming the basic idea is good,