Approach: Common Information Model for Elastic Stack

Greetings,

I have an environment where I'm collecting a number of disparate logs from the network and hosts and indexing them in Elasticsearch. Logs are being delivered mostly over syslog from network devices, Linux and BSD servers, and various applications. So the format of those logs differs, and the situation exists that the default grok filters for many of the log types result in differing field names for the same type of data. For example, a source IP address exists in Nginx and Apache logs (the client IP) but also in events related to Netflow data, firewall filter logs, SSH logs, and so on. The same goes for many different types of fields in the log data; various logs have a username field, source and destination port numbers, and so on. These sometimes have varying field names for the same type of data.

I'd like to be able to normalize these fields to common field names across as much of this data as possible, to be able to correlate, report and pivot on logs from various systems and sources. I believe it's possible to do this with field munging in Logstash and field aliases, etc. But I'd like to think bigger and find out if it would be possible to accomplish something like the Common Information Model (CIM) that is used in Splunk for similar purpose:

The Common Information Model is a set of field names and tags which are expected to define the least common denominator of a domain of interest. It is implemented as documentation on the Splunk docs website and JSON data model files in this add-on. Use the CIM add-on when modeling data or building apps to ensure compatibility between apps, or to just take advantage of these data models to pivot and report.

One primary value of a model like the CIM is that it can provide a central standard to which all users can align, and avoids the situation that one implementer can come up with a common field model using different terms than another's model. In this way, for example, Logstash grok filters could be used to express common fields in different data using the same common SEMANTIC, allowing harmonized patterns that could be used everywhere. Dashboards in Kibana could be written to use the same field names for charts and reports. The key would be that key field names in one Elastic environment may be the same as another, establishing a standard that can be selectively used (but hopefully always used because of the value it provides).

Does this concept exist in Elastic Stack land already? If not, is there potential for this to come to be and would it provide value?

We are working on an Elastic Common Schema in tandem with a few non-Elastic people.

Let me see if I can get someone involved in the project to comment.

Yes, for the same reasons you cite, we are indeed working on such a data model or schema, as we are currently calling it the Elastic Common Schema (ECS). While it is still a work in progress, we announced a public ECS GitHub repo in June 2018. Our goal is that with the collaboration of the community, we can quickly refine this into a released version. Each successful "mapping" of a new data source onto ECS helps the schema to converge to a state that will be able to support a wide variety of data sources in the future.

As I mentioned, ECS is still under development, so any feedback you could provide, in terms of fields that might be missing, questions about about how to map your data source fields to ECS, or even general feedback, would be appreciated.

Please check out the ECS Repo README.md
and the ECS Contributing Guide

Thanks!

2 Likes

Thanks for pointing this out, I'm excited to see the effort.

As it develops, socializing this with the community will be super important.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.