Parsing multiple date types from message field with ingest node


#1

I have the following case. I have setup Filebeat to send the logs straight to Elasticsearch since there is no need for significant log parsing. I am receiving logs from different services and in the message field the date format is different for each service. The version I'm using is 5.6

I have setup a pipeline with grok and date processors like this:

PUT _ingest/pipeline/test_pipeline
{
  "description": "timestamp test",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{SYSLOGTIMESTAMP:systime}", "%{EXIM_DATE:eximdate}"]
      },
      "date": {
        "field": "systime",
        "formats": ["MMM dd HH:mm:ss"],
        "ignore_failure": true
      },
      "date": {
        "field": "eximdate",
        "formats": ["yyyy-MM-dd HH:mm:ss"],
        "ignore_failure": true
      }, 
      "remove": {
        "field": ["systime", "eximdate"],
        "ignore_failure": true
      }
    }
  ]
}

I have checked the patterns individually and they both work, the issue here is that when I'm simulation with messages of each type, the @timestamp is updated only for the second pattern.

Here are the messages I am using to simulate the pipeline:

  • First pattern

      POST _ingest/pipeline/test_pipeline/_simulate
      {
        "docs": [
          {
            "_index": "index",
            "_source": 
            {
              "message": "Sep 24 12:17:01 ubuntu CRON[28771]: pam_unix(cron:session): session closed for user root"
            }
          }
        ]
      }
    
  • Second pattern

      POST _ingest/pipeline/test_pipeline/_simulate
      {
        "docs": [
          {
            "_source": 
            {
              "message": "2018-09-24 10:26:17 status unpacked apache2-utils:amd64 2.4.18-2ubuntu3.9"
            }
          }
        ]
      }
    

Here are the outputs for both types of messages

  • First pattern simulation output

      {
        "docs": [
          {
            "doc": {
              "_index": "index",
              "_type": "_type",
              "_id": "_id",
              "_source": {
                "message": "Sep 24 12:17:01 ubuntu CRON[28771]: pam_unix(cron:session): session closed for user root"
              },
              "_ingest": {
                "timestamp": "2018-09-25T12:41:42.843Z"
              }
            }
          }
        ]
      }
    
  • Second pattern simulation output

      {
        "docs": [
          {
            "doc": {
              "_index": "index",
              "_type": "_type",
              "_id": "_id",
              "_source": {
                "@timestamp": "2018-09-24T10:26:17.000Z",
                "exim_month": "09",
                "exim_day": "24",
                "exim_time": "10:26:17",
                "exim_year": "2018",
                "message": "2018-09-24 10:26:17 status unpacked apache2-utils:amd64 2.4.18-2ubuntu3.9"
              },
              "_ingest": {
                "timestamp": "2018-09-25T12:42:46.070Z"
              }
            }
          }
        ]
      }
    

You can clearly see that the @timestamp field shows up only on the second kind of pattern. And yet, they work perfectly fine if there is only one pattern and one date processor in the pipeline.


(Jake Landis) #2

Try this:

PUT _ingest/pipeline/test_pipeline
{
  "description": "timestamp test",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{SYSLOGTIMESTAMP:systime}",
          "%{EXIM_DATE:eximdate}"
        ]
      }
    },
    {
      "date": {
        "field": "systime",
        "formats": [
          "MMM dd HH:mm:ss"
        ],
        "ignore_failure": true
      }
    },
    {
      "date": {
        "field": "eximdate",
        "formats": [
          "yyyy-MM-dd HH:mm:ss"
        ],
        "ignore_failure": true
      }
    },
    {
      "remove": {
        "field": [
          "systime",
          "eximdate"
        ],
        "ignore_failure": true
      }
    }
  ]
}

^^ note the extra { and } around each processor. New versions (not sure exactly when) don't allow the format you posted and will error when trying to create the pipeline.

EDIT: fixed example


#3

That did it, thanks a bunch. It would have taken me quite a while to figure that out myself.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.