As suggested in the ECS documents I want to set the ecs.version field in my Logstash pipelines. But since I'm using multiplie cascading pipelines (and receive events via filebeat), sometimes the field is already set and I end up with multiple values in one field.
What should I do? Clean up the field before setting? Leave all values?
To be more clear about what I'm doing:
I receive messages via filebeat (which seems to set ecs.version to 1.4.0
I process the messages through a syslog pipeline which parses the syslog header using an out of the box grok pattern which doesn't honor ECS
Only the log events containing postfix in programm are afterwards processed by a postfix pipeline which sets the ecs.version field to 1.5.0, too
So I end up with an event that:
has some fields correctly set according to ECS 1.4.0 and the value 1.4.0 in ecs.version
has some fields that don't fit into ECS at all (like pid) but still ecs.version is set
has some fields conforming to ECS 1.5.0 due to the potfix pipeline
What should I do?
Always check for the existence of the field and in case remove and set it to the highest version?
Have an extra pipeline to rename the fields and set the version? (seems hard to handle and like wasting lots of resource)
Set only the minimum version all of the event is conforming to
If you are ingesting messages that conform to and are tagged with a given version it makes no sense to me to change that version tag unless you are also reformatting the messages.
If you are changing the messages to conform to a different version then you should set or overwrite the version they are tagged with.
The short of it is, if your pipeline receives an "older" ECS event in terms of ECS version, then proceeds to make adjustments that are dependent on more recent versions of ECS, then you're welcome to override the value to the correct one.
Note also that if your source is ECS 1.4.0 and you make further adjustments that don't rely on new fields from ECS 1.5.0, I don't think there's actually a need to override the version.
At this time, the use of this field IMO are the following:
Helping you assess which sources are falling behind in terms of ECS version. Just like you can do with agent.version.
Help data sources set correct expectations with regards to which ECS version they were developed against. If new fields are added in ECS that affect this source, but the source hasn't been updated yet (and hence the ECS version remains older), then you know what to expect: it will likely not populate the new fields. It's a way to decouple progress on ECS and progress on data sources, while setting clear expectations.
Of course it's also there to help future situations where we may want to ensure we only consume only recent enough events, because subtle changes have happened between ECS 1.X and 1.Y, where using older events would lead to problems.
However I'd like to point out one thing about the last point (using ecs.version as a predicate): if a field did not exist in 1.X and now exists in 1.Y, you do not need to filter out the 1.X events. In Elasticsearch, if you query for myfield: value over index-old,index-new, and the new myfield only exists in index-new, Elasticsearch will simply return documents that match in index-new without problem.
In other words, no changes since ECS 1.0 come to mind, that would require using ecs.version in a predicate.
I'd also like to set an expectation about Beats here. Despite the schema point of view I gave above -- that each source should set the appropriate ecs.version according to the version they were developed against -- Beats does not currently do that exactly. Beats currently sets ecs.version across all Beats, every time new ECS field definitions are imported, regardless of whether the sources (e.g. different Filebeat modules) will actually populate them.
It has come up multiple times for the team, that this caused confusion. So I think they will address that in a future version.
And I would not recommend making the field an array either, btw. I don't think this would cause significant issues, but it's not how the field is designed.
Ok, thanks. That helped a lot. So I know the outline.
I'm still not totally sure how I want to proceed. On one hand I alreday get the ecs.version field from filebeat so I won't have to change anything. On the other hand I want to show that I tried to stick to ecs with my Logstash pipeline.
I think, I might just add a filter which checks for the existence of the field and if it's not present, then add it. This way I can deal with different sources.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.