Filebeat->Kafka->Logstash->Elasticsearch fails for system.auth: "Provided Grok expressions do not match field value"

The ingest scenario below perplexingly fails for the Filebeat system module with the auth fileset with Provided Grok expressions do not match field value. It doesn't matter what the event is from /var/log/secure; every single one fails with a Grok failure.

Example Elasticsearch document with error.message is below.

Ironically, plugging in event.original into Stack Management > Ingest Pipelines > Edit > Test pipeline, the pipeline succeeds using the exact same filebeat-8.5.3-system-auth-pipeline that appears to have failed here.

Sample event from the end host:

grep 'session opened' /var/log/secure | tail -n1
Jan 12 21:29:07 REDACTED sudo: pam_unix(sudo-i:session): session opened for user root by ec2-user(uid=0)

Here is this event from tailing Kafka:

{"@timestamp":"2023-01-12T22:07:19.133Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.5.3","pipeline":"filebeat-8.5.3-system-auth-pipeline"},"fileset":{"name":"auth"},"service":{"type":"system"},"agent":{"ephemeral_id":"5f25a263-e27a-4b41-9233-81dad72cd450","id":"af433b61-1a37-4916-94f7-0e7d36f84d97","name":"ip-REDACTED.REDACTED.compute.internal","type":"filebeat","version":"8.5.3"},"ecs":{"version":"8.0.0"},"host":{"name":"ip-REDACTED.REDACTED.compute.internal"},"log":{"file":{"path":"/var/log/secure"},"offset":34944},"message":"Jan 12 22:07:14 ip-REDACTED sudo: pam_unix(sudo-i:session): session opened for user root by ec2-user(uid=0)","input":{"type":"log"},"event":{"module":"system","dataset":"system.auth","timezone":"+00:00"}}

The pipeline was added to Elasticsearch using filebeat setup --pipelines for filebeat/elasticsearch 8.5.3.

Logstash config:

    input {
      kafka {
        topics => ["system"]
        bootstrap_servers => "my-cluster-kafka-bootstrap:9092"
        codec => "json"
        auto_offset_reset => "earliest"
        group_id => "logstash_system_group"
        client_id => "logstash_system_consumer"
        consumer_threads => 5
      }
    }
    output {
      if [@metadata][pipeline] {
        elasticsearch {
          hosts => ["https://REDACTED:9200"]
          ssl => true
          ssl_certificate_verification => true
          cacert => "/etc/ssl/elasticsearch-certs/tls.crt"
          manage_template => false
          index => "%{[@metadata][beat]}-%{[@metadata][version]}"
          ecs_compatibility => "disabled"
          action => "create"
          user => "${username}"
          password => "${password}"
          pipeline => "%{[@metadata][pipeline]}"
        }
      } else {
        elasticsearch {
          hosts => ["https://REDACTED:9200"]
          ssl => true
          ssl_certificate_verification => true
          cacert => "/etc/ssl/elasticsearch-certs/tls.crt"
          manage_template => false
          index => "%{[@metadata][beat]}-%{[@metadata][version]}"
          ecs_compatibility => "disabled"
          action => "create"
          user => "${username}"
          password => "${password}"
        }
      }
    }

Failed Elasticsearch document:

{
  "_index": ".ds-filebeat-8.5.3-2023.01.03-000001",
  "_id": "REDACTED",
  "_version": 1,
  "_score": 0,
  "_source": {
    "input": {
      "type": "log"
    },
    "agent": {
      "name": "ip-REDACTED.REDACTED.compute.internal",
      "id": "af433b61-1a37-4916-94f7-0e7d36f84d97",
      "type": "filebeat",
      "ephemeral_id": "c514d01a-c282-4c2b-a91c-09c05af19b35",
      "version": "8.5.3"
    },
    "@timestamp": "2023-01-12T18:17:05.111Z",
    "ecs": {
      "version": "8.0.0"
    },
    "log": {
      "file": {
        "path": "/var/log/secure"
      },
      "offset": 25919
    },
    "service": {
      "type": "system"
    },
    "@version": "1",
    "host": {
      "name": "ip-REDACTED.REDACTED.compute.internal"
    },
    "event": {
      "ingested": "2023-01-12T18:17:06.882875402Z",
      "original": "{\"@timestamp\":\"2023-01-12T18:17:05.111Z\",\"@metadata\":{\"beat\":\"filebeat\",\"type\":\"_doc\",\"version\":\"8.5.3\",\"pipeline\":\"filebeat-8.5.3-system-auth-pipeline\"},\"input\":{\"type\":\"log\"},\"fileset\":{\"name\":\"auth\"},\"agent\":{\"name\":\"ip-REDACTED.REDACTED.compute.internal\",\"type\":\"filebeat\",\"version\":\"8.5.3\",\"ephemeral_id\":\"c514d01a-c282-4c2b-a91c-09c05af19b35\",\"id\":\"af433b61-1a37-4916-94f7-0e7d36f84d97\"},\"ecs\":{\"version\":\"8.0.0\"},\"host\":{\"name\":\"ip-REDACTED.REDACTED.compute.internal\"},\"log\":{\"offset\":25919,\"file\":{\"path\":\"/var/log/secure\"}},\"message\":\"Jan 12 12:06:33 ip-REDACTED su: pam_unix(su-l:session): session closed for user root\",\"event\":{\"dataset\":\"system.auth\",\"module\":\"system\",\"timezone\":\"+00:00\"},\"service\":{\"type\":\"system\"}}",
      "timezone": "+00:00",
      "module": "system",
      "dataset": "system.auth"
    },
    "fileset": {
      "name": "auth"
    },
    "message": "Jan 12 12:06:33 ip-REDACTED su: pam_unix(su-l:session): session closed for user root",
    "error": {
      "message": "Provided Grok expressions do not match field value: [{\"@timestamp\":\"2023-01-12T18:17:05.111Z\",\"@metadata\":{\"beat\":\"filebeat\",\"type\":\"_doc\",\"version\":\"8.5.3\",\"pipeline\":\"filebeat-8.5.3-system-auth-pipeline\"},\"input\":{\"type\":\"log\"},\"fileset\":{\"name\":\"auth\"},\"agent\":{\"name\":\"ip-REDACTED.REDACTED.compute.internal\",\"type\":\"filebeat\",\"version\":\"8.5.3\",\"ephemeral_id\":\"c514d01a-c282-4c2b-a91c-09c05af19b35\",\"id\":\"af433b61-1a37-4916-94f7-0e7d36f84d97\"},\"ecs\":{\"version\":\"8.0.0\"},\"host\":{\"name\":\"ip-REDACTED.REDACTED.compute.internal\"},\"log\":{\"offset\":25919,\"file\":{\"path\":\"/var/log/secure\"}},\"message\":\"Jan 12 12:06:33 ip-REDACTED su: pam_unix(su-l:session): session closed for user root\",\"event\":{\"dataset\":\"system.auth\",\"module\":\"system\",\"timezone\":\"+00:00\"},\"service\":{\"type\":\"system\"}}]"
    }
  },
  "fields": {
    "event.original": [
      "{\"@timestamp\":\"2023-01-12T18:17:05.111Z\",\"@metadata\":{\"beat\":\"filebeat\",\"type\":\"_doc\",\"version\":\"8.5.3\",\"pipeline\":\"filebeat-8.5.3-system-auth-pipeline\"},\"input\":{\"type\":\"log\"},\"fileset\":{\"name\":\"auth\"},\"agent\":{\"name\":\"ip-REDACTED.REDACTED.compute.internal\",\"type\":\"filebeat\",\"version\":\"8.5.3\",\"ephemeral_id\":\"c514d01a-c282-4c2b-a91c-09c05af19b35\",\"id\":\"af433b61-1a37-4916-94f7-0e7d36f84d97\"},\"ecs\":{\"version\":\"8.0.0\"},\"host\":{\"name\":\"ip-REDACTED.REDACTED.compute.internal\"},\"log\":{\"offset\":25919,\"file\":{\"path\":\"/var/log/secure\"}},\"message\":\"Jan 12 12:06:33 ip-REDACTED su: pam_unix(su-l:session): session closed for user root\",\"event\":{\"dataset\":\"system.auth\",\"module\":\"system\",\"timezone\":\"+00:00\"},\"service\":{\"type\":\"system\"}}"
    ],
    "fileset.name": [
      "auth"
    ],
    "input.type": [
      "log"
    ],
    "log.offset": [
      25919
    ],
    "message": [
      "Jan 12 12:06:33 ip-REDACTED su: pam_unix(su-l:session): session closed for user root"
    ],
    "agent.hostname": [
      "ip-REDACTED.REDACTED.compute.internal"
    ],
    "service.type": [
      "system"
    ],
    "agent.type": [
      "filebeat"
    ],
    "event.ingested": [
      "2023-01-12T18:17:06.882Z"
    ],
    "@timestamp": [
      "2023-01-12T18:17:05.111Z"
    ],
    "event.module": [
      "system"
    ],
    "agent.id": [
      "af433b61-1a37-4916-94f7-0e7d36f84d97"
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "error.message": [
      "Provided Grok expressions do not match field value: [{\"@timestamp\":\"2023-01-12T18:17:05.111Z\",\"@metadata\":{\"beat\":\"filebeat\",\"type\":\"_doc\",\"version\":\"8.5.3\",\"pipeline\":\"filebeat-8.5.3-system-auth-pipeline\"},\"input\":{\"type\":\"log\"},\"fileset\":{\"name\":\"auth\"},\"agent\":{\"name\":\"ip-REDACTED.REDACTED.compute.internal\",\"type\":\"filebeat\",\"version\":\"8.5.3\",\"ephemeral_id\":\"c514d01a-c282-4c2b-a91c-09c05af19b35\",\"id\":\"af433b61-1a37-4916-94f7-0e7d36f84d97\"},\"ecs\":{\"version\":\"8.0.0\"},\"host\":{\"name\":\"ip-REDACTED.REDACTED.compute.internal\"},\"log\":{\"offset\":25919,\"file\":{\"path\":\"/var/log/secure\"}},\"message\":\"Jan 12 12:06:33 ip-REDACTED su: pam_unix(su-l:session): session closed for user root\",\"event\":{\"dataset\":\"system.auth\",\"module\":\"system\",\"timezone\":\"+00:00\"},\"service\":{\"type\":\"system\"}}]"
    ],
    "log.file.path": [
      "/var/log/secure"
    ],
    "@version": [
      "1"
    ],
    "agent.ephemeral_id": [
      "c514d01a-c282-4c2b-a91c-09c05af19b35"
    ],
    "agent.name": [
      "ip-REDACTED.REDACTED.compute.internal"
    ],
    "agent.version": [
      "8.5.3"
    ],
    "host.name": [
      "ip-REDACTED.REDACTED.compute.internal"
    ],
    "event.dataset": [
      "system.auth"
    ],
    "event.timezone": [
      "+00:00"
    ]
  }
}

From Elasticsearch logs:

{"@timestamp":"2023-01-12T19:05:38.374Z", "log.level": "WARN", "message":"regular expression has redundant nested repeat operator * /^(?<TIMESTAMP:system.auth.timestamp>(?:(?:(?:(?>\\d\\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))[T ](?:(?:2[0123]|[01][0-9])):?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|[+-](?:(?:2[0123]|[01]?[0-9]))(?::?(?:(?:[0-5][0-9])))))?)|(?:(?:\\b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y|i)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\\b) +(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])) (?:(?!<[0-9])(?:(?:2[0123]|[01]?[0-9])):(?:(?:[0-5][0-9]))(?::(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))(?![0-9]))))) (?<SYSLOGHOST:host.hostname>(?:(?:(?:(?:(?:((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:)))(%.+)?)|(?:(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9]))))|(?:\\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\\.?|\\b)))))? (?<DATA:process.name>.*?)(?:\\[(?<POSINT:process.pid:long>\\b(?:[1-9][0-9]*)\\b)\\])?:(?:\\s*)+(?<GREEDYMULTILINE:_temp.message>(.|\\n)*)$/", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[REDACTED-es-default-2][transport_worker][T#8]","log.logger":"org.elasticsearch.ingest.common.GrokProcessor","trace.id":"a519613542055b4c94e1ace8c6a8ae55","elasticsearch.cluster.uuid":"5-RI1oI_QpaWgVJ6gviEug","elasticsearch.node.id":"REDACTED-TeRO-g","elasticsearch.node.name":"REDACTED-es-default-2","elasticsearch.cluster.name":"REDACTED"}

The Grok in question:

Are you suggesting editing filebeat-8.5.3-system-auth-pipeline?

No, I misunderstood

I have a suspicion of what's going on here and it seems more related to Beats.

First I set Logstash config to:

    output {
      file {
        path => "/tmp/logstash.log"
        codec => rubydebug {
          metadata => true
        }
      }
    }

and then tail -F -n0 /tmp/logstash.log

The output of an event is as follows:

{
         "input" => {
        "type" => "log"
    },
         "agent" => {
                "name" => "ip-REDACTED.REDACTED.compute.internal",
                  "id" => "af433b61-1a37-4916-94f7-0e7d36f84d97",
                "type" => "filebeat",
        "ephemeral_id" => "0574dc2d-f48b-4e5b-955a-1539e4212adf",
             "version" => "8.5.3"
    },
    "@timestamp" => 2023-01-12T23:57:21.928Z,
           "ecs" => {
        "version" => "8.0.0"
    },
           "log" => {
          "file" => {
            "path" => "/var/log/secure"
        },
        "offset" => 36158
    },
       "service" => {
        "type" => "system"
    },
     "@metadata" => {
            "beat" => "filebeat",
        "pipeline" => "filebeat-8.5.3-system-auth-pipeline",
         "version" => "8.5.3",
            "type" => "_doc"
    },
      "@version" => "1",
          "host" => {
        "name" => "ip-REDACTED.REDACTED.compute.internal"
    },
         "event" => {
        "original" => "{\"@timestamp\":\"2023-01-12T23:57:21.928Z\",\"@metadata\":{\"beat\":\"filebeat\",\"type\":\"_doc\",\"version\":\"8.5.3\",\"pipeline\":\"filebeat-8.5.3-system-auth-pipeline\"},\"fileset\":{\"name\":\"auth\"},\"service\":{\"type\":\"system\"},\"input\":{\"type\":\"log\"},\"ecs\":{\"version\":\"8.0.0\"},\"host\":{\"name\":\"ip-REDACTED.REDACTED.compute.internal\"},\"message\":\"Jan 12 23:57:17 ip-REDACTED sudo: pam_unix(sudo-i:session): session opened for user root by ec2-user(uid=0)\",\"event\":{\"dataset\":\"system.auth\",\"timezone\":\"+00:00\",\"module\":\"system\"},\"agent\":{\"name\":\"ip-REDACTED.REDACTED.compute.internal\",\"type\":\"filebeat\",\"version\":\"8.5.3\",\"ephemeral_id\":\"0574dc2d-f48b-4e5b-955a-1539e4212adf\",\"id\":\"af433b61-1a37-4916-94f7-0e7d36f84d97\"},\"log\":{\"offset\":36158,\"file\":{\"path\":\"/var/log/secure\"}}}",
        "timezone" => "+00:00",
          "module" => "system",
         "dataset" => "system.auth"
    },
       "fileset" => {
        "name" => "auth"
    },
       "message" => "Jan 12 23:57:17 ip-REDACTED sudo: pam_unix(sudo-i:session): session opened for user root by ec2-user(uid=0)"
}

now check out the grok: it attempts to grok event.original:

However, it should be grok'ing the message field as does its cousin in the same module, system.syslog:

In other words, it would appear Elasticsearch ingest pipeline filebeat-8.5.3-system-auth-pipeline is attempting to grok an escaped JSON documented rather than just the raw message.

Until @andrewkroh merged the latest PR those patterns were applied to [message] rather than [event][original]. I guess there is an implicit assumption that the two are equivalent.

The json codec on the kafka input definitely breaks that assumption.

1 Like

Funny thing is the Elasticsearch docs specifically recommend sending and receiving JSON from Kafka. IMO, event.original should never be set to the full string-ified JSON. It should just be the raw message itself.

Fixing one error leads to another.

    filter {
      if [event][module] == "system" and [event][dataset] == "system.auth" {
        mutate {
          copy => { "message" => "[event][original]" }
        }
      }
    }

error.message is now: field [message] already exists.

Here's a workaround for system.auth - sync pipeline with Fleet integration by andrewkroh · Pull Request #32360 · elastic/beats · GitHub and https://github.com/elastic/integrations/pull/3705

    filter {
      if [event][module] == "system" and [event][dataset] == "system.auth" {
        mutate {
          copy => { "message" => "[event][original]" }
          remove_field => [ "message" ]
        }
      }
    }

Can Elastic just call Beats deprecated at this point?

This bug effects more than just a system module.

See here

1 Like

I've found a separate fix for this which does not require the filter shown above.

Set --pipeline.ecs_compatibility disabled at the command line or in logstash.yml

The confusing part was that it was not sufficient to only set

ecs_compatibility => "disabled"

within output.elasticsearch in logstash.conf. I had to set both to prevent Logstash from changing schema.

This seems like the better solution as it is in-line with the intent here, which is to just Logstash as a consumer to consume events from Kafka and push them unaltered to Elasticsearch where processing happens on the ingest node.

1 Like