Parsing system.syslog with pipeline logs-system.syslog-1.54.0

Hello

We have following issue after enabling log collection by elastic agent
with system v1.54.0 integration

We ship logs to logstash and then to ES. ES pipeline logs-system.syslog-1.54.0 tries to grok field message, but when I output to file from logstash, logs from elastic agents(for system.syslog) don't have this field, they only have event.original.

Grok section of logs-system.syslog pipeline

 {
    "grok": {
      "field": "message",
      "patterns": [
        "%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{SYSLOGHOST:host.hostname} %{DATA:process.name}(?:\\[%{POSINT:process.pid:long}\\])?: %{GREEDYMULTILINE:system.syslog.message}",
        "%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{GREEDYMULTILINE:system.syslog.message}",
        "%{TIMESTAMP_ISO8601:system.syslog.timestamp} %{SYSLOGHOST:host.hostname} %{DATA:process.name}(?:\\[%{POSINT:process.pid:long}\\])?: %{GREEDYMULTILINE:system.syslog.message}"
      ],
      "pattern_definitions": {
        "GREEDYMULTILINE": "(.|\n)*"
      },
      "ignore_missing": true
    }
  },

Example log (file output from logstash)

{
	"input": {
		"type": "log"
	},
	"agent": {
		"version": "8.12.1",
		"name": "FOO-BAR",
		"ephemeral_id": "3ac98aba-dd7c-4bf9-ab1e-e1df821a2c21",
		"id": "5082c6d4-5539-406d-92a7-7f5f24bf9e31",
		"type": "filebeat"
	},
	"@version": "1",
	"elastic_agent": {
		"version": "8.12.1",
		"snapshot": false,
		"id": "5082c6d4-5539-406d-92a7-7f5f24bf9e31"
	},
	"host": {
		"ip": [
			"fe80::ecee:eeff:feee:eeee"
		],
		"containerized": false,
		"id": "1f093cab5e0f4556b431be1e432376aa",
		"os": {
			"kernel": "5.10.0-21-amd64",
			"version": "11 (bullseye)",
			"name": "Debian GNU/Linux",
			"type": "linux",
			"codename": "bullseye",
			"platform": "debian",
			"family": "debian"
		},
		"mac": [
			"EE-EE-EE-EE-EE-EE"
		],
		"architecture": "x86_64",
		"name": "FOO-BAR",
		"hostname": "FOO-BAR"
	},
	"data_stream": {
		"dataset": "system.syslog",
		"type": "logs",
		"namespace": "default"
	},
	"tags": [
		"beats_input_codec_plain_applied",
	],
	"@timestamp": "2024-03-19T13:24:05.274Z",
	"ecs": {
		"version": "8.0.0"
	},
	"event": {
		"original": "Mar 19 14:24:05 FOO-BAR REMOVED",
		"dataset": "system.syslog",
		"timezone": "+01:00"
	},
	"log": {
		"offset": 272868660,
		"file": {
			"path": "/var/log/messages"
		},
		"logger": "FOO-BAR"
	}
}

This can be worked around by following filter in logstash, but should agent not use message or pipeline event.original field?

filter {
  if [event][dataset] == "system.syslog" {
    mutate {
      copy => { "[event][original]" => "[message]" }
      id => "mutate_syslog_b2"
    }
  }
}

The issue is that all elastic agent integrations expect that the data is sent by the agent directly to Elasticsearch where it will be processed by the ingest pipelines, when you add logstash between the agent and elasticsearch, it can make some things to not work correctly.

There are a couple of issues about ingest pipelines failing or not working correctly when you have logstash between the agents and elasticsearch, I had one with system-auth and a quick search returned this other issue about system-syslog, but there is also this one which affects multiple integrations.

Since this is something that happens in the beginning of the ingest pipeline, the only way to solve this is to make a PR changing the ingest pipeline to use event.original, but it also needs to consider events that does not have event.original.

So it would need to check if event.original exists, and if it does not exists copy the content of the message field to it and then the grok processor would use the event.original field.

Something like this:

---
description: Pipeline for parsing Syslog messages.
processors:
  - rename:
      if: ctx.event?.original == null
      field: message
      target_field: event.original
      ignore_missing: true
  - grok:
      field: event.original
      patterns:
        - '%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{SYSLOGHOST:host.hostname} %{DATA:process.name}(?:\[%{POSINT:process.pid:long}\])?: %{GREEDYMULTILINE:system.syslog.message}'
        - '%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{GREEDYMULTILINE:system.syslog.message}'
        - '%{TIMESTAMP_ISO8601:system.syslog.timestamp} %{SYSLOGHOST:host.hostname} %{DATA:process.name}(?:\[%{POSINT:process.pid:long}\])?: %{GREEDYMULTILINE:system.syslog.message}'
      pattern_definitions:
        GREEDYMULTILINE: |-
          (.|
          )*
      ignore_missing: true

I also use logstash between Elastic Agent and Elasticsearch, but I disabled both the ecs compatibility and the enrich on the elastic_agent input.

Hello

Thx, for reply, yes I already noticed that system-auth is also broken.
On 1.54 I have to remove log.syslog by logstash to make it work.