Elasticsearch json logs with filebeat module and ingest pipeline

Hello,

Elasticsearch writes json logs by default in 7.x (thanks!). When configuring Filebeat's Elasticsearch module, I was thinking I could ignore the ingest pipeline part since it's already shipping structured logs, but it looks like there are ingest pipelines specific to the json logs as well. What's the logic in doing parsing at this stage when the logs are already structured and Elasticsearch and Filebeat have full control over field names at the source?

I ask because adding in the ingest pipeline adds a decent amount of complexity. There's a lot more to managing the lifecycle of run once setups like ingest pipelines across beat versions than there is in just doing config management for filebeat configs.

Thanks,
Jeff

2 Likes

I see the json processing for the message is being done in the ingest pipeline so it is pretty critical to ingesting useful logs.

What's the advantage to using filebeat modules for elasticsearch vs just adding the logs as a new filebeat input and doing json decoding at the filebeat layer?

1 Like

I started going down the path of just ingesting without using modules and decoding the json in filebeat. I quickly found out that elasticsearch's json logs can't be decoded by filebeat since the json is multiline. There are also some other oddities like timestamp being used instead of @timestamp.

The good news is it looks like this is being worked on:
https://github.com/elastic/elasticsearch/issues/46119 (timestamp to @timestamp)
https://github.com/elastic/elasticsearch/pull/47105 (removing multiline stacktraces)

Long story short:

  • I'll use the filebeat elasticsearch module and ingest pipeline for now.
  • It is less work (up front and long term maintenance) to just decode proper json logs in filebeat. I'm looking forward to being able to do this after the logs are restructured.

I'd still love to hear from some folks on if my assumptions and future plans look good. @pgomulka I'm tagging you here since your name is all over the :Core/Infra/Logging github issues.

Thanks!

1 Like

I decided to stick with doing the decoding in filebeat once I realized I could do multiline json using multiline and the decode_json_fields processor. Here's what I ended up doing. Hope this is useful to others:

- type: log
  paths:
    - /var/log/elasticsearch/*_server.json
    - /var/log/elasticsearch/*_audit.json
    - /var/log/elasticsearch/*_index_search_slowlog.json
    - /var/log/elasticsearch/*_index_indexing_slowlog.json
    - /var/log/elasticsearch/*_deprecation.json
  multiline.pattern: '^{'
  multiline.negate: true
  multiline.match:  after
  processors:
    - decode_json_fields:
        fields: ['message']
        target: ""
        overwrite_keys: true # Required to get the message field populated with the nested message field
        # add_error_key: true # Add this when you upgrade beats to 7.x

  # Add timestamp processor when you upgrade beats to 7.x
  # https://www.elastic.co/guide/en/beats/filebeat/current/processor-timestamp.html

We're still running filebeat 6.x so I added some stubs in there for features available in 7.x. The timestamp one is the big one. Right now the timestamp will reflect when filebeat ingests the log not when the log was generated due to the elasticsearch log using timestamp instead of @timestamp.

1 Like

Hi @jeffspahr,

Indeed, as you've noted, once Elasticsearch generates JSON-formatted logs in ECS format, there won't be much work needed to ingest these logs with Filebeat. Until then, I'd suggest using the elasticsearch module as it's doing a few renames, etc. to make the resulting event structure ECS-compatible (e.g. https://github.com/elastic/beats/blob/master/filebeat/module/elasticsearch/server/ingest/pipeline-json.json). Downstream applications such as the Logs UI or the Stack Monitoring UI in Kibana depend on the ECS structure and the elasticsearch module guarantees that for you. Even if the ES log structure were to change in some way, the module would be updated to account for that, while also remaining backwards compatible. In other words, using the module abstracts away the need for users to understand the Elasticsearch JSON log structure, keep up with any changes to it, and make sure the end result is acceptable to downstream applications mentioned above.

As for your point about whether the transformation could be done in Filebeat itself vs. using an Elasticsearch ingest pipeline, this is a fair point. The tradeoff, of course, if where you want the processing power to be spent and whether you'd prefer for Filebeat to perform as a naive shipper, focussing on getting logs off the source machine as quickly as possible or whether you'd be okay with it taking on the cost of parsing/processing of those logs before shipping them off to Elasticsearch. Of course, once Elasticsearch starts generating JSON-formatted logs in ECS format, there should be minimal to no parsing required and we should definitely look at getting rid of the ingest pipeline setup by the elasticsearch module at that point and instead have the module perform any parsing (if at all needed) using Filebeat processors instead.

Hope that helps,

Shaunak

2 Likes

@shaunak Thanks so much for that reply! I ended up staying away from the elasticsearch module and ingest pipeline, but I agree with all the points you made.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.