ECS field for pre-aggregated messages

ES 7.11 brought us the _doc_count metadata field and the ability to properly compute aggregations over individual ES documents that represent multiple occurrences.

I would like to make use of this feature to preserve the original count of messages generated by logging facilities that "compress" / "aggregate" / "deduplicate" repeat messages. For example, when our application recognizes that the message it's trying to log has been repeated 100 times, it outputs a single copy of the message and just appends text like so:

Connection to server failed.
...
This message was repeated 100 times.

But since we're using structured logging, we could instead add the repeat count to the log document as a distinct piece of information, something like {"message_repeat_count":100}. (I'm using this name to clarify the meaning, but a different name would be more appropriate.)

Once this document reaches Elasticsearch, we could use an ingest pipeline to set _doc_count from the repeat count field and thus we would be able to "compress" our log data transmissions and preserve the ability to see true count in Elasticsearch/Kibana.

I wanted to ask the ECS / Elastic team (and customers) if they felt this was a useful idea and appropriate for ECS. (@andrewthad made this same suggestion 2 years ago.)

Assuming the basic idea is good,

How about event.count?

Additional thoughts on the field name:

_doc_count is the ES document metadata name, so a leading underscore is appropriate, but in the context of ECS the information is not metadata. (ECS doesn't have the concept of metadata.)

doc_count would map (correspond) nicely to the ES field, but the objective with ECS does not seem to be to build a scheme that aligns with ES per se, but to represent (tech industry) data. This also has the problem of being about "documents" rather than events. (I feel it's best to be clear when we're talking about events versus the documents that represent them.)

count as a base field might not be too bad, but this leaves it ambiguous as to whether we're talking about the count of the actual events or of the records/messages about events.

Placing the field within the event field set does a better job of indicating that we're talking about the count of the actual events instead of the count of documents. (Though I note that some fields in there (1, 2) are about the record rather than the event.)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.