Implementing ECS with custom fields

Hi,

We contributed the icinga module for Filebeat and started developing a Logstash pipeline for parsing Icinga logs: https://github.com/Icinga/icinga-logstash-pipeline. It seems like there are far too many different rules to implement them into the filebeat module or the ingestion pipeline so we opted for a Logstash pipeline containing all rules.

I'm currently trying to line up the parsing when logs come from filebeat by reading the log on the filesystem with logs that where already preprocessed by the icinga module in filebeat. Doing this I just realized that the output of the icinga module was changed to fit better into ECS. I know, I can but I don't have to follow the conventions of ECS when it comes to custom fields but we want to stay as compatible as possible.

Could you help me with deciding, how to build our custom fields when it comes to ECS compatibility?

A possible logevent looks like this:

[2019-10-18 09:51:07 +0200] information/RemoteCheckQueue: items: 0, rate: 0/s (18/min 90/5min 270/15min);

Both the icinga module in Filebeat as the pipeline will remove [2019-10-18 09:51:07 +0200] information/RemoteCheckQueue: and leave the rest as the "real" message.

The current version of the icinga module puts information into log.level and RemoteCheckQueue into icinga.main.facility. While I like most of these changes, I don't know, what to make of the main part. In our pipeline we just put RemoteCheckQueue into icinga.facility and use this for if to determine which filters to apply. Why I already started a branch, which moves icinga.facility to icinga.main.facility I'm not sure how to go on from here.

  • Will the name of the field be changed again? We ship dashboards with the Logstash pipeline which we have to change, too. I don't want to do this over and over again.
  • What's main? More important, what's not-main? Should we go on with icinga.remotecheckqueue? I don't think that's reasonable especially because we have overlapping fields in different facilities which is, what we want. (Some information can be part of several components but should end up in the same field)
  • Should everything go into icinga.main? It's easily done but I don't see a point in this.

Maybe you could shed some light on how to stay as close to ECS as possible.

We have an open issue on GitHub for this topic, too.

Thanks,
Thomas

Hi @widhalmt,

The main comes from the way Filebeat modules are structured.

Compare the directory for the icinga module with the directory for the Apache httpd module. The modules are split between subsections that parse different logs of a given source.

For Apache httpd, there's a section to parse the access log, and one to parse the error log, and custom fields from each file are nested accordingly (apache.access.* & apache.error.*). So that's where "main" comes from for icinga, the "main" log is parsed by the pipeline in that subdirectory. It's possible to rename the fields for that source to not be nested at icinga.main.*, but be nested at icinga.* directly instead. You can open an issue or PR on the Beats repo to accomplish that.

I think the above also answers your question "What's main & what's not-main?" In the icinga module the other sources (not-main) are the debug log and startup log. Does this make sense to you? If not, we're always open to suggestions to improve this :slight_smile:

I'll go over a few of your other points, please let me know if I forget anything.

Doing some of the parsing in Beats may be useful, in cases where the extracted information leads to more enrichment possibilities. As an example, extracting a PID in Beats lets you then use the add_process_metadata processor to collect a lot of information on the process mentioned in the message.

However I agree that in general it's easier to manage server side pipelines, as pipeline adjustments don't need to be rolled out to all of the hosts being monitored.

On the Logstash vs ingest pipeline question, Logstash is more flexible in some ways, but can be more maintenance over the long run. It's important to note that Logstash can leverage existing ingest pipelines. Here's documentation on how to do it with Filebeat's ingest pipelines.

So perhaps a good approach would be to do strategic parsing in Beats (see the PID example), then do as much as possible in Elasticsearch ingest pipelines, as they can be used by users, whether or not they use Logstash; and provide documentation to Logstash users on how to leverage these ingest pipelines from their Logstash config.

I agree with stripping out some of the header and leaving only the rest as the real message :+1:

We've recently added direct Syslog support to ECS, which wasn't the case when we initially migrated the icinga Filebeat module. So perhaps the best would be to migrate to those, you can check it out here https://www.elastic.co/guide/en/ecs/current/ecs-log.html

1 Like

Thanks for your thorough reply! That explains the main part of the field names. I'll talk to other Icinga team members if we want to open an PR for changing it back. That's mostly because main and debug log come from the same engine using the same format just with different paths and loglevels. While it seems weird in the first place to have the same place in two different places it helps with the feature concept of Icinga.

About the rest: Thanks a lot for explaining. I'm just in the middle of an important conference and I'll have to dive deeper into that. I'll be coming back to this.

Thanks for now.

Yeah I think there won't be an issue removing that nesting. The HAProxy module is an example of a module that nests only one level deep, if you want to check it out.

I'm sure a PR to move the Icinga module in the same direction would be welcome as well. If anyone wants to filter for "just the debug logs" or "just the main logs", they can still filter based on the event.dataset field anyway.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.