Provided Grok patterns do not match field value

Hello,

I have an issue for a few days I can resolve.

I want to parse "Feb 7 00:23:34 ig1-edge-dc3-01_re0 rpd[1524]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 158.58.176.35 (External AS 200271) changed state from EstabSync to Established (event RsyncAck) (instance master)"
with
"%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{SYSLOGHOST:host.hostname}_%{DATA:host.re} %{DATA:process.name}(?:\\[%{POSINT:process.pid:long}\\])?: %{DATA:bgp.state}: %{DATA:bgp.protocol} %{IP:client.ip} \\(%{DATA:bgp.as}\\) %{GREEDYMULTILINE:system.syslog.message}"

I first though it was because there were IPV4 and IPV6 addresses but it wasn't the issue.

This message "Feb 10 22:40:26 ig1-ar-dc3-01 rpd[2121]: BGP_IO_ERROR_CLOSE_SESSION: BGP peer 158.58.180.12 (External AS 65200): Error event Operation timed out(60) for I/O session - closing it" have a similar syntax and it parses but not the first one.

I don't understand why, can you help me please ?

This is much easier to debug for others if you provide the full ingest pipeline simulate call. Please do so the next time.

I don't think there is a GREEDYMULTILINE grok pattern unless you created - which means you need to provide a full reproduction. Looks ok otherwise to me, so just try to use GREEDYDATA instead or provide a full reproduction.

Here is the full pipeline :

PUT /_ingest/pipeline/logs_pipeline
{
  "description": "Pipeline for parsing bgp messages.",
  "processors": [
    {
      "grok": {
        "pattern_definitions": {
          "GREEDYMULTILINE": "(.|)*"
        },
        "ignore_missing": true,
        "field": "message",
        "patterns": [
        "%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{SYSLOGHOST:host.hostname}_%{DATA:host.re} %{DATA:process.name}(?:\\[%{POSINT:process.pid:long}\\])?: %{DATA:bgp.state}: %{DATA:bgp.protocol} %{IP:client.ip} \\(%{DATA:bgp.as}\\) %{GREEDYMULTILINE:system.syslog.message}"
         "%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{SYSLOGHOST:host.hostname} %{DATA:process.name}(?:\\[%{POSINT:process.process_pid:long}\\])?: %{GREEDYMULTILINE:system.syslog.message}",
         "%{SYSLOGTIMESTAMP:system.syslog.timestamp} %{GREEDYMULTILINE:system.syslog.message}",
        "%{TIMESTAMP_ISO8601:system.syslog.timestamp} %{SYSLOGHOST:host.hostname} %{DATA:process.name}(?:\\[%{POSINT:process_pid.pid:long}\\])?: %{GREEDYMULTILINE:system.syslog.message}"

        ]
      }
    },
    {
      "remove": {
        "field": "message"
      }
    },
    {
      "rename": {
        "field": "system.syslog.message",
        "target_field": "message",
        "ignore_missing": true
      }
    },
    {
      "date": {
        "formats": [
          "MMM  d HH:mm:ss",
          "MMM dd HH:mm:ss",
          "MMM d HH:mm:ss",
          "ISO8601"
        ],
        "on_failure": [
          {
            "append": {
              "field": "error.message",
              "value": "{{ _ingest.on_failure_message }}"
            }
          }
        ],
        "if": "ctx?.event?.timezone == null",
        "field": "system.syslog.timestamp",
        "target_field": "@timestamp"
      }
    },
    {
      "date": {
        "if": "ctx?.event?.timezone != null",
        "field": "system.syslog.timestamp",
        "target_field": "@timestamp",
        "formats": [
          "MMM  d HH:mm:ss",
          "MMM dd HH:mm:ss",
          "MMM d HH:mm:ss",
          "ISO8601"
        ],
        "timezone": "{{ event.timezone }}",
        "on_failure": [
          {
            "append": {
              "field": "error.message",
              "value": "{{ _ingest.on_failure_message }}"
            }
          }
        ]
      }
    },
    {
      "remove": {
        "field": "system.syslog.timestamp"
      }
    }
  ],
  "on_failure": [
    {
      "set": {
        "field": "error.message",
        "value": "{{ _ingest.on_failure_message }}"
      }
    }
  ]
}

have you tried replacing with that one with GREEDYDATA as suggested previously?

I still doen't parse the field the way I would like to.

please add a full simulate pipeline API call including the full pipeline and a sample document, so I can reproduce and have all the needed information in one place. Thx!

Sorry for the late answer, I will post this today. Thanks for your time.

I manage to fix the issue. I have multiple grok patterns and it seems there is a priority order so I change the position and now it works.

just FYI, there is no priority order. The order is sequential. If not, please provide an example and we can take a deeper look.

I modified my pipeline, here it is :

 PUT /_ingest/pipeline/logs_pipeline
    {
      "description": "Pipeline for parsing bgp messages.",
      "processors": [
        {
          "grok": {
            "pattern_definitions" : {
                "GREEDYMULTILINE" : """(.|
    )*""",
                "BGP": "Internal AS [0-9]*",
                "HANDLE_STATE": "bgp_handle_notify", 
                "IFACHANGE_STATE": "bgp_ifachange_group",
                "CLOSE_SESSION": "BGP_IO_ERROR_CLOSE_SESSION",
                "SEND_SESSION": "bgp_send",
                "SPECIAL_CHAR": ":" 
              },
            "ignore_missing": true,
            "trace_match": true,
            "field": "message",
            "patterns": [
               /* bgp_ifachange_group */
                "(?:%{TIMESTAMP_ISO8601:system.syslog.timestamp}|%{SYSLOGTIMESTAMP:system.syslog.timestamp})? %{SYSLOGHOST:host.hostname}(?:_%{DATA:host.re})? %{DATA:process.name}(?:\\[%{POSINT:process.pid:long}\\])?: %{IFACHANGE_STATE:bgp.state}:(?:%{NUMBER:bgp.number})?: %{DATA:bgp.protocol} %{IP:client.ip} \\(%{DATA:bgp.as}\\)(?:%{SPECIAL_CHAR})? %{GREEDYMULTILINE:system.syslog.message}",
                /* bgp_handle_notify */
                 "(?:%{TIMESTAMP_ISO8601:system.syslog.timestamp}|%{SYSLOGTIMESTAMP:system.syslog.timestamp})? %{SYSLOGHOST:host.hostname}(?:_%{DATA:host.re})? %{DATA:process.name}(?:\\[%{POSINT:process.pid:long}\\])?: %{HANDLE_STATE:bgp.state}:%{NUMBER:bgp.number}(?:%{SPECIAL_CHAR})? \\(%{DATA:bgp.as}\\)(?:%{SPECIAL_CHAR})? %{GREEDYMULTILINE:system.syslog.message}",
                /* bgp_send */
                "(?:%{TIMESTAMP_ISO8601:system.syslog.timestamp}|%{SYSLOGTIMESTAMP:system.syslog.timestamp})? %{SYSLOGHOST:host.hostname}(?:_%{DATA:host.re})? %{DATA:process.name}(?:\\[%{POSINT:process.pid:long}\\])?: %{SEND_SESSION:bgp.state}: %{DATA:bgp.protocol} %{IP:client.ip} \\(%{DATA:bgp.as}\\) %{GREEDYMULTILINE:system.syslog.message}",
                /* BGP_IO_ERROR_CLOSE_SESSION uniquement */
                "(?:%{TIMESTAMP_ISO8601:system.syslog.timestamp}|%{SYSLOGTIMESTAMP:system.syslog.timestamp})? %{SYSLOGHOST:host.hostname}(?:_%{DATA:host.re})? %{DATA:process.name}(?:\\[%{POSINT:process.pid:long}\\])?: %{CLOSE_SESSION:bgp.state}: %{DATA:bgp.protocol}(?:%{SPECIAL_CHAR})? %{IP:client.ip} \\(%{DATA:bgp.as}\\): %{GREEDYMULTILINE:system.syslog.message}",
                 /* BGP */
                "(?:%{TIMESTAMP_ISO8601:system.syslog.timestamp}|%{SYSLOGTIMESTAMP:system.syslog.timestamp})? %{SYSLOGHOST:host.hostname}(?:_%{DATA:host.re})? %{DATA:process.name}(?:\\[%{POSINT:process.pid:long}\\])?: %{HANDLE_STATE:bgp.state}(?::%{NUMBER:bgp.number})?(?:%{SPECIAL_CHAR})?(?: %{DATA:bgp.protocol})?(?:%{SPECIAL_CHAR})?(?: %{IP:client.ip})? \\(%{DATA:bgp.as}\\)(?:%{SPECIAL_CHAR})? %{GREEDYMULTILINE:system.syslog.message}",
                /* STANDARD */
                "(?:%{SYSLOGTIMESTAMP:system.syslog.timestamp}|%{TIMESTAMP_ISO8601:system.syslog.timestamp})? %{SYSLOGHOST:host.hostname} %{DATA:process.name}(?:\\[%{POSINT:process.pid:long}\\])?: %{GREEDYMULTILINE:system.syslog.message}",
                "(?:%{SYSLOGTIMESTAMP:system.syslog.timestamp}|%{TIMESTAMP_ISO8601:system.syslog.timestamp})? %{GREEDYMULTILINE:system.syslog.message}"
            ]
          }
        },
        {
          "remove": {
            "field": "message"
          }
        },
        {
          "rename": {
            "field": "system.syslog.message",
            "target_field": "message",
            "ignore_missing": true
          }
        },
        {
          "date": {
            "formats": [
              "MMM  d HH:mm:ss",
              "MMM dd HH:mm:ss",
              "MMM d HH:mm:ss",
              "ISO8601"
            ],
            "on_failure": [
              {
                "append": {
                  "field": "error.message",
                  "value": "{{ _ingest.on_failure_message }}"
                }
              }
            ],
            "if": "ctx.event?.timezone == null",
            "field": "system.syslog.timestamp",
            "target_field": "@timestamp"
          }
        },
        {
          "date": {
            "if": "ctx.event?.timezone != null",
            "field": "system.syslog.timestamp",
            "target_field": "@timestamp",
            "formats": [
              "MMM  d HH:mm:ss",
              "MMM dd HH:mm:ss",
              "MMM d HH:mm:ss",
              "ISO8601"
            ],
            "timezone": "{{ event.timezone }}",
            "on_failure": [
              {
                "append": {
                  "field": "error.message",
                  "value": "{{ _ingest.on_failure_message }}"
                }
              }
            ]
          }
        },
        {
          "remove": {
            "field": "system.syslog.timestamp"
          }
        }
      ],
      "on_failure": [
        {
          "set": {
            "field": "error.message",
            "value": "{{ _ingest.on_failure_message }}"
          }
        }
      ]
    }            

and i didn't manage to create the document so here what I use :

POST _ingest/pipeline/logs_pipeline/_simulate/
{
  "docs": 
  [
   {
    "_index": "filebeat-7.6.0",
    "_source": 
      {
      "message": "Feb 7 00:22:59 pi-edge-ki-01_re0 rpd[1524]: bgp_ifachange_group:7279: NOTIFICATION sent to 10.12.128.35 (External AS 350371): code 6 (Cease) subcode 6 (Other Configuration Change), Reason: Interface change for the peer-group"
      }
    },
    {
    "_index": "filebeat-7.6.0",
    "_source": 
      {
      "message": "Feb 7 00:22:59 pi-edge-ki-01_re0 rpd[1524]: bgp_ifachange_group:7279: NOTIFICATION sent to 10.12.128.35 (External AS 350371): code 6 (Cease) subcode 6 (Other Configuration Change), Reason: Interface change for the peer-group"
      }
    },
    {
    "_index": "filebeat-7.6.0",
    "_source": 
      {
      "message": "Feb 10 22:40:26 pi-ar-ki-01 rpd[2121]: BGP_IO_ERROR_CLOSE_SESSION: BGP peer 10.12.128.12 (External AS 53600): Error event Operation timed out(60) for I/O session - closing it"
      }
    },
    {
    "_index": "filebeat-7.6.0",
    "_source": 
      {
      "message": "Mar 04 11:35:49 ig1-edge-ki-01 rpd[1524]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 10.58.128.35 (External AS 350371) changed state from EstabSync to Established (event RsyncAck) (instance master)"
      }
    },
    {
    "_index": "filebeat-7.6.0",
    "_source": 
      {
      "message": "Mar 16 02:09:35 pi-edge-ki-01 rpd[1524]: bgp_send: sending 19 bytes to 10.12.123.7 (Internal AS 40231) blocked (no spooling requested): Resource temporarily unavailable"
      }
    }
  ]
}

Thanks for your time.

Furthermore, for messages with the same syntax, it doesn't parse the same way. I tried to find the reason, but I couldn't.