I want to migrate my custom Logstash pipeline to the Elastic Common Scheme, since I want to get started with the Elastic SIEM UI. In https://www.elastic.co/blog/migrating-to-elastic-common-schema-in-beats-environments it is mentioned that migrating custom pipelines will be covered in a future post, however I hope I can already get some input here on this forum.
I have a PFSense running which sends its firewall logs to ELK via syslog => logstash => elasticsearch. I also have Snort enabled on that PFSense which has barnyard sending the logs to ELK via syslog => logstash => elasticsearch.
For the syslog entries to get parsed nicely, I'm using grok patterns, both for the firewall logs as well as the Snort logs.
the snort log filtering + grok patterns are custom created
The fields are therefore custom labelled in the grok patterns, like this: %{INT:ids_priority}\]\:? \<%{PFSENSE_IFACE}\> \{PROTO\:%{INT}\} %{IP:src_ip} \-\> %{IP:dst_ip}$",
Now, my question is if it would be good practice to simply migrate to ECS in my logstash filter configuration, for example by changing the src_ip to source.ip (https://www.elastic.co/guide/en/ecs/current/ecs-source.html) in the above example? I could then do a search replace in my dashboards for the src_ip field. Or should I do the mapping higher up the chain (e.g. in elasticsearch)?
Yes, changing the output field name directly in the grok or in a plugin's attribute for the destination of the plugin output (e.g. GeoIP) would be a great start.
Now that you have the field names right, you'll want to look carefully at the field datatypes as well. You can of course start with the documentation for this.
But here are two additional pointers:
You can look at, and experiment with sample Elasticsearch templates that contain only the ECS fields. You can check them out directly in the git repo here https://github.com/elastic/ecs/tree/master/generated/elasticsearch. You'll likely want to use the appropriate git version tag, not the file in master, which is a development branch.
Secondly, ECS formalizes the pattern that most text fields in monitoring use cases are used with the keyword datatype for aggregations and exact match searches / filtering, and text is rarely used. So in general, text fields are directly the keyword datatype, and there's no multi-field named myfield.keyword.
In other words, where Elasticsearch and the Logstash ES template default to having all text fields set up like this myfield == text and myfield.keyword == keyword, ECS will mostly only have myfield == keyword. If you need full text search on some of these fields, it's perfectly valid to add a multi-field in your template, and end up with the reverse convention: myfield == keyword (the ECS field) and myfield.text == text (your custom additional field).
The two exceptions on the string field datatype are message and error.message, which are text only, at this time.
Once again, if you have log sources where you need aggregations on these fields, you can add a custom multi-field using the default ES convention, which would be to add message.keyword.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.