Greetings,
I have an environment where I'm collecting a number of disparate logs from the network and hosts and indexing them in Elasticsearch. Logs are being delivered mostly over syslog from network devices, Linux and BSD servers, and various applications. So the format of those logs differs, and the situation exists that the default grok filters for many of the log types result in differing field names for the same type of data. For example, a source IP address exists in Nginx and Apache logs (the client IP) but also in events related to Netflow data, firewall filter logs, SSH logs, and so on. The same goes for many different types of fields in the log data; various logs have a username field, source and destination port numbers, and so on. These sometimes have varying field names for the same type of data.
I'd like to be able to normalize these fields to common field names across as much of this data as possible, to be able to correlate, report and pivot on logs from various systems and sources. I believe it's possible to do this with field munging in Logstash and field aliases, etc. But I'd like to think bigger and find out if it would be possible to accomplish something like the Common Information Model (CIM) that is used in Splunk for similar purpose:
The Common Information Model is a set of field names and tags which are expected to define the least common denominator of a domain of interest. It is implemented as documentation on the Splunk docs website and JSON data model files in this add-on. Use the CIM add-on when modeling data or building apps to ensure compatibility between apps, or to just take advantage of these data models to pivot and report.
One primary value of a model like the CIM is that it can provide a central standard to which all users can align, and avoids the situation that one implementer can come up with a common field model using different terms than another's model. In this way, for example, Logstash grok filters could be used to express common fields in different data using the same common SEMANTIC, allowing harmonized patterns that could be used everywhere. Dashboards in Kibana could be written to use the same field names for charts and reports. The key would be that key field names in one Elastic environment may be the same as another, establishing a standard that can be selectively used (but hopefully always used because of the value it provides).
Does this concept exist in Elastic Stack land already? If not, is there potential for this to come to be and would it provide value?