Minimal Filebeat configuration for sending Logstash message in JSON format to Logstash

Until now, we have had Logstash produce its log messages in plain-text format (written to /var/log/logstash/logstash-plain.log). And we had Filebeat ship the log messages to a Logstash cluster where the log messages were processed using various grok/match filters.

But now I want to have Logstash produce log message in JSON and avoid all the grok'ing and matching. Getting Logstash to produce log messages in JSON (or ndjson - Newline Delimited JSON) is easy. Just add the --log.format json to the command that starts Logstash, as in

/usr/share/logstash/bin/logstash" --path.settings "/etc/logstash" --log.format "json"

And out comes log messages like this in /var/log/logstash/logstash-json.log:

{"level":"INFO","loggerName":"logstash.javapipeline","timeMillis":1690897355410,"thread":"[main]-pipeline-manager","logEvent":{"message":"Pipeline started","pipeline.id":"main"}}

{"level":"INFO","loggerName":"logstash.agent","timeMillis":1690897355457,"thread":"Agent thread","logEvent":{"message":"Pipelines running","count":1,"running_pipelines":[{"metaClass":{"metaClass":{"metaClass":{"running_pipelines":"[:main]","non_running_pipelines":[]}}}}]}}

But what do I put in my Filebeat configuration (/etc/filebeat/conf.d/logstash.yml) for shipping these message to Logstash? Something like this I hope:

- type: log
  paths:
  - /var/log/logstash/logstash-json.log
  encoding: plain
  document_type: json
  json:
      keys_under_root: true
      overwrite_keys: true
      add_error_key: true

  [more stuff]

  fields:
    type: logstash
    server: ********
  fields_under_root: false

Do I need anything else? I have experimented with the message_key option, both as:
message_key: logEvent and message_key: logEvent.message, but without any noticeable result or difference.

Over at the Logstash side, I have been experimenting with a filter-configuration like this:

filter {
  if [fields][type] == "logstash" {
    json {
      source => "message"
    }
  }
}

output {
  if [fields][type] == "logstash" {
    elasticsearch {
      [blah blah blah]
    }
  }
}

It sorts of work. The log messages appears and can be viewed using Kibana, but every single message has a _grokparsefailure tag attached to it even though I do not have a grok filter in the logstash.conf file?!?

And there is no message field, only a logEvent.message field.

Anyone with a working, minimal Filebeat configuration, and a working minimal Logstash filter that can help me out? Why do I get a _grokparsefailure when I am not grokking, and how do I copy the information in the logEvent fields to the (normal) message field?

A minimal configuration would be just the path:

- type: log
  paths:
  - /var/log/logstash/logstash-json.log

But I would use the filestream input since the log input is deprecated.

This happens because you are parsing the json file already in filebeat, so there is no need to parse it in Logstash.

These lines in your filebeat configuration will tell filebeat to decode the json:

  json:
      keys_under_root: true
      overwrite_keys: true
      add_error_key: true

If you want to parse the json in Logstash in need to remove it.

I don't think this happens, if you have a _grokparsefailure tag in your event then you have a grok filter somewhere in the pipeline. How are you running logstash? Are you using multiple pipelines with pipelines.yml pointing to your pipeline configuration or are you using the default pipelines.yml? The default points to /etc/logstash/conf.d/*.conf and this will merge every .conf file in that folder as one pipeline.

@leandrojmp wrote:

I don't think this happens, if you have a _grokparsefailure tag in your event then you have a grok filter somewhere in the pipeline. How are you running logstash? Are you using multiple pipelines with pipelines.yml pointing to your pipeline configuration or are you using the default pipelines.yml ? The default points to /etc/logstash/conf.d/*.conf and this will merge every .conf file in that folder as one pipeline.

I run Logstash with the default pipeline, with some 80+ configuration files. I slimed it down to just 2 files: input.conf and logstash.conf. There is not a single grok-filter directive in sight, but I still get the _grokparsefailure.

Do you know if there is a logger, like logstash.filters.grok, that I can enable in trace mode to see where and what goes wrong?

Why don't you use LS module for FB?

Good question! Because I did not know of its existence! Let me try it, and I will return with an update.

The logger would be something like this:

logger.logstash.filters.grok

You can change the level with the following request in your logstash server:

curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'{ "logger.logstash.filters.grok" : "DEBUG" }'

But as I said to have a _grokparsefailure a grok filter needs to exist somewhere, running grep grok *.conf on your path does not return anything? Maybe some leftover config?

Also, are you using ingest pipelines in Elasticsearch? Not sure, but if an ingest pipeline has a grok and it fails I think it would also generate the same tag.

I think I got the logstash module enable and working - at least somewhat. My problem now is that the documents ends up in my logstash-unknown index instead of the logstash-logstash index.

With the old method, using a conf-file, I had this in my /etc/filebeat/conf.d/logstash.yml:

  paths:
  - /var/log/logstash/logstash-json.log

  fields:
    type: logstash
    server: mgxlostapp04
  fields_under_root: false

Together with this in my /etc/logstash/conf.d/logstash.conf`

filter {
  if [fields][type] == "logstash" {
    mutate {
      add_tag => "Parsed via logstash on host *******"
      add_tag => "Ruleset logstash"
    }

  ... ... ...
}

output {
  if [fields][type] == "logstash" {
    elasticsearch {
      hosts               => ["********"]
      ... ... ...
      manage_template     => true
      template_overwrite  => true
      template            => "/etc/logstash/templates/logstash-logstash.json"
      template_name       => "logstash-logstash"
      ... ... ...
  }
}

that sent the logstash-documents to the correct index.

I tried to replace the
if [fields][type] == "logstash" with
if [event][module] == "logstash"
but it did not work. The documents ended up in logstash-unknown.

I did the grep thing, and there was nothing groky in the conf.d directory when I had reduced it to just the input.conf and logstash.conf files.

I do not think we use any ingest pipelines in Elasticsearch.

I added the following to my log4j2.properties

logger.jba.name  = logstash.filters
logger.jba.level = trace

and got some DEBUG messages from loggers named logstash.filters.json and logstash.filters.mutate, but not word from logstash.filters.grok.

I did notice this in /usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-grok-4.4.2/lib/logstash/filters/grok.rb:

    # Append values to the `tags` field when there has been no
    # successful match
    config :tag_on_failure, :validate => :array, :default => ["_grokparsefailure"]

Could it be that the grok filter somehow gets initialized and activated even though it is not actively used by a filter, and decides to tag everything with a grok parse failure because no match was done?

Is there a noop, a dummy match I can try? Something like:

  grok {
    match { "message" => "%{GREEDYDATA:dummy}" }
  }

to make grok happy?

I don't think so, this doesn't seem to be possible.

It would make no difference, you would just add a grok filter that would match everything, so this grok would not emit the _grokparsefailure tag, but you need to find what is the source of this tag.

I would doublecheck the pipelines and logstash configurations to make sure that you are indeed running the pipelines without any grok and not some leftover config. Can you share your pipelines.yml file?

Do you have only one Logstash?

Also, what is the source of your data? Can you share a document where you have the _grokparsefailure?

As mentioned, the _grokparsefailure is the defautl failure tag added by the grok processor, if you didn't find any grok in your configurations, the reason may be some of these things:

  • You are not running the config you think you are running, check the logstash process with a ps to validate that it is indeed running a configuration without any grok.
  • You have another instance with a config that has a grok filter running and sending data to the same indices.
  • The tag is comming from the source, maybe?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.