Logstash Date Filter issue

Hello all, as part of testing Elasticsearch for use within our organisation I'm trying to capture entries from the Windowsupdate.log file in C:\Windows but I'm having trouble converting the log's timestamp to be the Elasticsearch @timestamp.

I created a filter to add the date to a custom field called 'timestamp' - my logstash.conf below:

 filter {

        if [type] == "winupdatelog" {
                grok {
                       match => { "message" => "(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}\t%{HOUR}:?%{MINUTE}(?::?%{SECOND}))\t%{GREEDYDATA:PID}\t%{GREEDYDATA:TID}\t%{GREEDYDATA:component}\t%{GREEDYDATA:message} "
                       add_field => [ "received_at", "%{@timestamp}" ]
                       add_field => [ "received_from", "%{@source_host}" ]
                }

                date {
                      match => [ "timestamp","yyyy-MM-dd        HH:mm:ss,SSS" ]

                }

However it doesn't seem to be working; when I view the documents in Kibana they all have the date they were received rather than the one in the actual log. Below is the JSON output for one of the documents:

{
  "_index": "filebeat-2017",
  "_type": "winupdatelog",
  "_id": "AVrkOHzT3xn1BxItwBNG",
  "_score": null,
  "_source": {
    "offset": 229679,
    "input_type": "log",
    "PID": " 976",
    "source": "C:\\Windows\\Windowsupdate.log",
    "message": [
      "2017-03-03\t20:36:04:398\t 976\t1930\tMisc\tValidating signature for C:\\Windows\\SoftwareDistribution\\WuRedir\\7971F918-A847-4430-9279-4A52D1EFE18D\\TMP87F9.tmp with dwProvFlags 0x00000080:",
      "Validating signature for C:\\Windows\\SoftwareDistribution\\WuRedir\\7971F918-A847-4430-9279-4A52D1EFE18D\\TMP87F9.tmp with dwProvFlags"
    ],
    "type": "winupdatelog",
    "TID": "1930",
    "tags": [
      "beats_input_codec_plain_applied",
      "_dateparsefailure"
    ],
    "component": "Misc",
    "received_from": "%{@source_host}",
    "@timestamp": "2017-03-19T01:39:44.955Z",
    "received_at": "2017-03-19T01:39:44.955Z",
    "@version": "1",
    "beat": {
      "hostname": "LAPTOP01",
      "name": "LAPTOP01",
      "version": "5.2.2"
    },
    "host": "LAPTOP01",
    "timestamp": "2017-03-03\t20:36:04:398"
  },
  "fields": {
    "@timestamp": [
      1489887584955
    ]
  },
  "sort": [
    1489887584955
  ]
}

Can anyone tell me where I'm going wrong?

Thanks.

Check your Logstash logs. The date filter will tell you why it had problems parsing your timestamp.

Thanks for getting back to me. The error in the Logstash log is as follows:

Failed parsing date from field {:field=>"timestamp", :value=>"2017-03-01\t23:35:20:473", :exception=>"Invali
d format: "2017-03-01\t23:35:20:473" is malformed at ":473"", :config_parsers=>"yyyy-MM-dd\tHH:mm:ss,SSS", :config_locale=>"default=en_US"}

So it appears to be a problem with the millisecond formatting but as far as I can see I've used the right format - ,SSS

Your logs have a colon before the milliseconds so your date pattern should end with :SSS.

Thanks, I changed the match statement to be as follows:

date {
                      match => [ "timestamp","yyyy-MM-dd        HH:mm:ss:SSS" ]

                }

However when I attempt to restart Logstash it fails to start. If I issue the command to test the config:

./logstash -f /usr/local/etc/logstash/logstash.conf -t

The result is as follows:

[2017-03-20T16:10:36,518][DEBUG][logstash.runner          ] --------------- Logstash Settings -------------------
[2017-03-20T16:10:36,520][DEBUG][logstash.runner          ] Reading config file {:config_file=>"/usr/local/etc/logstash/logstash.conf"}
[2017-03-20T16:10:36,528][FATAL][logstash.runner          ] The given configuration is invalid. Reason: Expected one of #, input, filter, output at line 3, column 9 (byte 23) after #input {

I've got Logstash to start again, there was a '#'at the beginning of the file, I've no idea where that came from :slight_smile:

Although Logstash starts correctly, I'm still not getting the correct timstamp for the logs and strangely since I made the suggested change I'm no longer seeing the custom 'timestamp' field or the 'received_at' and 'received_from' fields that I told Logstash to add.

After further investigation, it seems it is working correctly for the most part, however 968 records have the _grokparsefailure tag and these are the records with the wrong timestamp and missing fields, I suppose I'll need to investigate what the problem is with these documents.

As far as I can see I'm getting the _grokparsefailure tag on log entries that only contain unusual characters i.e '###########' or '-----------------'

I'm using %{GREEDYDATA:message} to capture this data, is there something else I need to be doing to get this data without a parse error?

Please give an example of such a message.

Thanks again, message lines below:

2017-03-19 16:21:49:356 6296 454 COMAPI -------------
2017-03-19 16:21:49:357 6296 454 COMAPI -- START -- COMAPI: Search [ClientId = Windows Defender]
2017-03-19 16:21:49:357 6296 454 COMAPI ---------

The lines that only contain --------- and no other characters (lines 1 and 3) are the ones that are failing to parse. They don't contain any useful information so I suppose I could ignore them but I'd still like to know the reason

The end of your grok expression looks like this:

\t%{GREEDYDATA:PID}\t%{GREEDYDATA:TID}\t%{GREEDYDATA:component}\t%{GREEDYDATA:message}

Messages with no tab characters will never match.

You really should avoid having more than one GREEDYDATA or DATA in your grok expression. It's a performance killer and can result in unexpected matches.

Thanks, however when I used the following grok debugger: https://grokdebug.herokuapp.com and the line does match:

Input: 2017-03-19 16:21:49:356 6296 454 COMAPI -------------

Filter: (?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY}\t%{HOUR}:?%{MINUTE}(?::?%{SECOND}))\t%{GREEDYDATA:PID}\t%{GREEDYDATA:TID}\t%{GREEDYDATA:Component}\t%{GREEDYDATA:message}

Output:

{
  "timestamp": [
    [
      "2017-03-19\t16:21:49:356"
    ]
  ],
  "YEAR": [
    [
      "2017"
    ]
  ],
  "MONTHNUM": [
    [
      "03"
    ]
  ],
  "MONTHDAY": [
    [
      "19"
    ]
  ],
  "HOUR": [
    [
      "16"
    ]
  ],
  "MINUTE": [
    [
      "21"
    ]
  ],
  "SECOND": [
    [
      "49:356"
    ]
  ],
  "PID": [
    [
      "6296"
    ]
  ],
  "TID": [
    [
      "454"
    ]
  ],
  "Component": [
    [
      "COMAPI"
    ]
  ],
  "message": [
    [
      "-------------"
    ]
  ]
}

Ah, right. Not sure off the top of my head why that's the case.

Yeah, I've looked in the Logstash log and can't see anything obviously wrong with these particular entries. I don't know whether its relevant but whilst I'm using Filebeat 5.2.2 I am using an old version of the Elasticsearch stack with Logstash, Elasticsearch and Kibana at version 5.0.2

Also I'm a little confused about the output of the messages that are matching properly; If you look at my match filter I've tried to break the data up into TID, PID and Component fields - this does work but I'm still getting the entire line in the message field too - is that normal? JSON example of a message below:

{
  "_index": "filebeat-2017",
  "_type": "winupdatelog",
  "_id": "AVrsydmn3xn1BxItwUTA",
  "_score": null,
  "_source": {
    "offset": 1266943,
    "input_type": "log",
    "PID": "3592",
    "source": "C:\\Windows\\Windowsupdate.log",
    "message": [
      "2017-03-20\t13:22:15:152\t3592\t1764\tCOMAPI\t>>--  RESUMED  -- COMAPI: Search [ClientId = Windows Defender]",
      ">>--  RESUMED  -- COMAPI: Search [ClientId = Windows"
    ],
    "type": "winupdatelog",
    "TID": "1764",
    "tags": [
      "beats_input_codec_plain_applied"
    ],
    "component": "COMAPI",
    "received_from": "%{@source_host}",
    "@timestamp": "2017-03-20T13:22:15.152Z",
    "received_at": "2017-03-20T17:35:29.540Z",
    "@version": "1",
    "beat": {
      "hostname": "LAPTOP01",
      "name": "LAPTOP01",
      "version": "5.2.2"
    },
    "host": "LAPTOP01",
    "timestamp": "2017-03-20\t13:22:15:152"
  },
  "fields": {
    "@timestamp": [
      1490016135152
    ]
  },
  "sort": [
    1490016135152
  ]
}

Also I'm a little confused about the output of the messages that are matching properly; If you look at my match filter I've tried to break the data up into TID, PID and Component fields - this does work but I'm still getting the entire line in the message field too - is that normal?

Yes. Unless specifically configured to do otherwise a grok filter doesn't touch the field is parses.

Thanks for your help in all this, if I can figure out why those why I'm getting a parse error with those messages I'll post back.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.