Parsing multiple date types from message field with ingest node

dlazarov · September 25, 2018, 12:34pm

I have the following case. I have setup Filebeat to send the logs straight to Elasticsearch since there is no need for significant log parsing. I am receiving logs from different services and in the message field the date format is different for each service. The version I'm using is 5.6

I have setup a pipeline with grok and date processors like this:

PUT _ingest/pipeline/test_pipeline
{
  "description": "timestamp test",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{SYSLOGTIMESTAMP:systime}", "%{EXIM_DATE:eximdate}"]
      },
      "date": {
        "field": "systime",
        "formats": ["MMM dd HH:mm:ss"],
        "ignore_failure": true
      },
      "date": {
        "field": "eximdate",
        "formats": ["yyyy-MM-dd HH:mm:ss"],
        "ignore_failure": true
      }, 
      "remove": {
        "field": ["systime", "eximdate"],
        "ignore_failure": true
      }
    }
  ]
}

I have checked the patterns individually and they both work, the issue here is that when I'm simulation with messages of each type, the @timestamp is updated only for the second pattern.

Here are the messages I am using to simulate the pipeline:

First pattern

  POST _ingest/pipeline/test_pipeline/_simulate
  {
    "docs": [
      {
        "_index": "index",
        "_source": 
        {
          "message": "Sep 24 12:17:01 ubuntu CRON[28771]: pam_unix(cron:session): session closed for user root"
        }
      }
    ]
  }

Second pattern

  POST _ingest/pipeline/test_pipeline/_simulate
  {
    "docs": [
      {
        "_source": 
        {
          "message": "2018-09-24 10:26:17 status unpacked apache2-utils:amd64 2.4.18-2ubuntu3.9"
        }
      }
    ]
  }

Here are the outputs for both types of messages

First pattern simulation output

  {
    "docs": [
      {
        "doc": {
          "_index": "index",
          "_type": "_type",
          "_id": "_id",
          "_source": {
            "message": "Sep 24 12:17:01 ubuntu CRON[28771]: pam_unix(cron:session): session closed for user root"
          },
          "_ingest": {
            "timestamp": "2018-09-25T12:41:42.843Z"
          }
        }
      }
    ]
  }

Second pattern simulation output

  {
    "docs": [
      {
        "doc": {
          "_index": "index",
          "_type": "_type",
          "_id": "_id",
          "_source": {
            "@timestamp": "2018-09-24T10:26:17.000Z",
            "exim_month": "09",
            "exim_day": "24",
            "exim_time": "10:26:17",
            "exim_year": "2018",
            "message": "2018-09-24 10:26:17 status unpacked apache2-utils:amd64 2.4.18-2ubuntu3.9"
          },
          "_ingest": {
            "timestamp": "2018-09-25T12:42:46.070Z"
          }
        }
      }
    ]
  }

You can clearly see that the @timestamp field shows up only on the second kind of pattern. And yet, they work perfectly fine if there is only one pattern and one date processor in the pipeline.

jakelandis · September 25, 2018, 1:39pm

Try this:

PUT _ingest/pipeline/test_pipeline
{
  "description": "timestamp test",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{SYSLOGTIMESTAMP:systime}",
          "%{EXIM_DATE:eximdate}"
        ]
      }
    },
    {
      "date": {
        "field": "systime",
        "formats": [
          "MMM dd HH:mm:ss"
        ],
        "ignore_failure": true
      }
    },
    {
      "date": {
        "field": "eximdate",
        "formats": [
          "yyyy-MM-dd HH:mm:ss"
        ],
        "ignore_failure": true
      }
    },
    {
      "remove": {
        "field": [
          "systime",
          "eximdate"
        ],
        "ignore_failure": true
      }
    }
  ]
}

^^ note the extra { and } around each processor. New versions (not sure exactly when) don't allow the format you posted and will error when trying to create the pipeline.

EDIT: fixed example

dlazarov · September 26, 2018, 6:55am

That did it, thanks a bunch. It would have taken me quite a while to figure that out myself.

system · October 24, 2018, 6:56am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat - Ingest Node - Parsing Date Beats filebeat	2	1109	July 16, 2018
Date parser using ingest node Beats filebeat	1	326	February 6, 2020
Trouble using ingest pipeline to parse two different log formats Elasticsearch	3	599	January 3, 2017
Date processor on ingest node not working Meta Elastic	1	1335	November 7, 2018
Ingest Pipeline Elasticsearch	7	375	May 21, 2021

Parsing multiple date types from message field with ingest node

Related topics