Date filter seems to remove message

Hello,

My company is considering switching from graylog to logstash. For this I am setting up a proof of concept and I'm running into a strange problem when doing so. When using the date filter, the message disappears. I would have expected it to remain visible but with a _dateparsefailure tag.

The log itself is bulky because it's coming from a Java program running inside a container. This is sent through the docker log driver to syslog where it is picked up by filebeat and sent to logstash.
The syslog part is handy because of the host name and docker container name. Timings are kept but are mostly irrelevant. The true timing that we are interested in is that what can be read in the java output between square brackets. Filtering this out is not an issue and works fine.
The issue is that when I want to use the date filter, the whole message dissapears. I have read other threads but they seem to have other problems than this.

As a sidenote. How can I assure my management that all logs are in, ie none are discarded?

Example log
Jun 14 09:45:00 test1 testsystem-acc-1-candidate[1618]: [2017-06-14 09:45:00.629] INFO Timer-0 be.private.consulservice.impl.ConsulServiceImpl Sending service state PASS for service acc_71_0669739

Below is my logstash config. This gives me an additional field called virgo_timestamp. This works.

input
{
  beats
  {
    port => 5046
    type => syslog
  }
}

filter
{
  if [type] == "syslog"
  {
    grok
    {
      # Here we get the syslog data out of the message
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:syslog_hostname} %{SYSLOGPROG}: %{GREEDYDATA:syslog_message}" }
    }
  }
  if [syslog_message] and [program] !~ "nginx" and [program] !~ "filebeat"
  {
    grok
    {
      match => { "syslog_message" => "\[%{TIMESTAMP_ISO8601:virgo_timestamp}\]" }
      tag_on_failure => ["_grokparsefailure","virgo_time_grok_failed"]
    }
    #date
    #{
    #  match => [ "virgo_timestamp" , "ISO8601" ]
    #  tag_on_failure => ["_dateparsefailure","set_virgo_date_failed"]
    #}
  }
  if [message] =~ "json:"
  {
    grok
    {
      match => { "message" => "json:%{GREEDYDATA:json_data}" }
    }
    json
    {
      source => "json_data"
    }
  }
}

output
{
  elasticsearch
  {
    hosts => "192.168.3.28:9200"
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

This will give me something similar as this:

{
  "_index": "filebeat-2017.06.14",
  "_type": "syslog",
  "_id": "AVylkBiDKgtg5zxQpxdg",
  "_version": 1,
  "_score": null,
  "_source": {
    "offset": 13550939,
    "input_type": "log",
    "pid": "1618",
    "source": "/var/log/syslog",
    "program": "testsystem-acc-1-candidate",
    "message": "Jun 14 09:45:00 test1 testsystem-acc-1-candidate[1618]: [2017-06-14 09:45:00.629] INFO  Timer-0                      be.private.consulservice.impl.ConsulServiceImpl               Sending service state PASS for service acc_71_76251",
    "type": "syslog",
    "syslog_message": "[2017-06-14 09:45:00.629] INFO  Timer-0                      be.private.consulservice.impl.ConsulServiceImpl               Sending service state PASS for service acc_71_76251",
    "tags": [
      "beats_input_codec_plain_applied"
    ],
    "@timestamp": "2017-06-14T07:45:01.426Z",
    "syslog_hostname": "test1",
    "syslog_timestamp": "Jun 14 09:45:00",
    "@version": "1",
    "beat": {
      "hostname": "test1filebeat_kibana",
      "name": "test1filebeat_kibana",
      "version": "5.4.1"
    },
    "host": "test1filebeat_kibana",
    "virgo_timestamp": "2017-06-14 09:45:00.629"
  },
  "fields": {
    "@timestamp": [
      1497426301426
    ]
  },
  "sort": [
    1497426301426
  ]
}

As you can see a lot of work is left (filtering out and using the host from syslog, multiline stack trace grok, etc). But that is doable. I'm just puzzled by the disappearing messages when the date filter is activated.
Other messages do get through, but only those that do not enter this 'if'

if [syslog_message] and [program] !~ "nginx" and [program] !~ "filebeat"

Could be a mapping conflict with what's in ES already. The Logstash logs should tell you about that.

Hello Magnus. Thank you for your reply.

There are indeed 3 messages in the logstash output. Is it normal to have only 3 similar error messages instead of one error message per log parse issue?
From what I see here a json parser failed on the syslog date. The message is one of those that are not displayed anymore so it must be related. All three logs are coming from an Exception that one of our applications throwed.

However, I don't see how this is related to the date behaviour.

08:49:59.891 [[main]>worker8] WARN  logstash.filters.json - Error parsing json {:source=>"json_data", :raw=>"{ \"msg\": \"end of hessian call\", \"call_id\": \"243\", \"time_ms\": \"0\" }\nJun 14 10:49:57 test1 testsystem-ct-1-candidate[1618]: [2017-06-14 10:49:57.035] INFO  pool-17-thread-1             be.private.loaderservlet.imagecapture.PhatomJsService         #011Returning file of size 4037", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Jun': was expecting ('true', 'false' or 'null')
 at [Source: [B@49412eba; line: 2, column: 5]>}

Can I verify something else? Or put on a more verbose logging specific for the date filter?

hello @magnusbaeck

I've managed to find some additional information. It appears to be unrelated to the date filter. When you look at the debug output below, the timestamp of my application is correctly applied to the @timestamp field.

11:47:10.213 [[main]>worker3] DEBUG logstash.pipeline - output received {"event"=>{"offset"=>33824306, "input_type"=>"log", "pid"=>"1618", "source"=>"/var/log/syslog", "program"=>"testsystem-ts-2-candidate", "message"=>"Jun 14 13:47:05 test1 testsystem-ts-2-candidate[1618]: [2017-06-14 13:47:05.458] INFO  Timer-0                      be.private.consulservice.impl.ConsulServiceImpl               Sending service state PASS for service tsfzkc_63_4886", "type"=>"syslog", "syslog_message"=>"[2017-06-14 13:47:05.458] INFO  Timer-0                      be.private.consulservice.impl.ConsulServiceImpl               Sending service state PASS for service tsfzkc_63_4886", "tags"=>["beats_input_codec_plain_applied"], "@timestamp"=>2017-06-14T13:47:05.458Z, "syslog_hostname"=>"test1", "syslog_timestamp"=>"Jun 14 13:47:05", "@version"=>"1", "beat"=>{"hostname"=>"test1filebeat_kibana", "name"=>"test1filebeat_kibana", "version"=>"5.4.1"}, "host"=>"test1filebeat_kibana", "virgo_timestamp"=>"2017-06-14 13:47:05.458"}} 

The question remains then why these messages are not shown in Kibana.
I've sent all logstash output to stdout as per your suggestion in another thread and the lines are there, with the correctly updated timestamp.

So this moves the issue out of the logstash camp and into elasticsearch or Kibana.

I went into the Kibana dev tools and checked the elasticsearch raw output.
One of the messages that had the date filter seems to have been saved like this.

"@timestamp": "2017-06-14T00:01:22.570Z",
          "syslog_hostname": "test1",
          "syslog_timestamp": "Jun 14 02:01:21",

In other words, the syslog timestamp is ok. Logstash it's stdoutput seems ok too. But elasticsearch saved it with a different time?

Hello @magnusbaeck

I just wanted to let you know that you were correct.
The elasticsearch indice changed today since we have one per day and it works now. So it was indeed a mapping issue conflict.

Thank you for you help.

Best regards,
Evert

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.