Hello,
My company is considering switching from graylog to logstash. For this I am setting up a proof of concept and I'm running into a strange problem when doing so. When using the date filter, the message disappears. I would have expected it to remain visible but with a _dateparsefailure tag.
The log itself is bulky because it's coming from a Java program running inside a container. This is sent through the docker log driver to syslog where it is picked up by filebeat and sent to logstash.
The syslog part is handy because of the host name and docker container name. Timings are kept but are mostly irrelevant. The true timing that we are interested in is that what can be read in the java output between square brackets. Filtering this out is not an issue and works fine.
The issue is that when I want to use the date filter, the whole message dissapears. I have read other threads but they seem to have other problems than this.
As a sidenote. How can I assure my management that all logs are in, ie none are discarded?
Example log
Jun 14 09:45:00 test1 testsystem-acc-1-candidate[1618]: [2017-06-14 09:45:00.629] INFO Timer-0 be.private.consulservice.impl.ConsulServiceImpl Sending service state PASS for service acc_71_0669739
Below is my logstash config. This gives me an additional field called virgo_timestamp. This works.
input
{
beats
{
port => 5046
type => syslog
}
}
filter
{
if [type] == "syslog"
{
grok
{
# Here we get the syslog data out of the message
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:syslog_hostname} %{SYSLOGPROG}: %{GREEDYDATA:syslog_message}" }
}
}
if [syslog_message] and [program] !~ "nginx" and [program] !~ "filebeat"
{
grok
{
match => { "syslog_message" => "\[%{TIMESTAMP_ISO8601:virgo_timestamp}\]" }
tag_on_failure => ["_grokparsefailure","virgo_time_grok_failed"]
}
#date
#{
# match => [ "virgo_timestamp" , "ISO8601" ]
# tag_on_failure => ["_dateparsefailure","set_virgo_date_failed"]
#}
}
if [message] =~ "json:"
{
grok
{
match => { "message" => "json:%{GREEDYDATA:json_data}" }
}
json
{
source => "json_data"
}
}
}
output
{
elasticsearch
{
hosts => "192.168.3.28:9200"
manage_template => false
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}
}
This will give me something similar as this:
{
"_index": "filebeat-2017.06.14",
"_type": "syslog",
"_id": "AVylkBiDKgtg5zxQpxdg",
"_version": 1,
"_score": null,
"_source": {
"offset": 13550939,
"input_type": "log",
"pid": "1618",
"source": "/var/log/syslog",
"program": "testsystem-acc-1-candidate",
"message": "Jun 14 09:45:00 test1 testsystem-acc-1-candidate[1618]: [2017-06-14 09:45:00.629] INFO Timer-0 be.private.consulservice.impl.ConsulServiceImpl Sending service state PASS for service acc_71_76251",
"type": "syslog",
"syslog_message": "[2017-06-14 09:45:00.629] INFO Timer-0 be.private.consulservice.impl.ConsulServiceImpl Sending service state PASS for service acc_71_76251",
"tags": [
"beats_input_codec_plain_applied"
],
"@timestamp": "2017-06-14T07:45:01.426Z",
"syslog_hostname": "test1",
"syslog_timestamp": "Jun 14 09:45:00",
"@version": "1",
"beat": {
"hostname": "test1filebeat_kibana",
"name": "test1filebeat_kibana",
"version": "5.4.1"
},
"host": "test1filebeat_kibana",
"virgo_timestamp": "2017-06-14 09:45:00.629"
},
"fields": {
"@timestamp": [
1497426301426
]
},
"sort": [
1497426301426
]
}
As you can see a lot of work is left (filtering out and using the host from syslog, multiline stack trace grok, etc). But that is doable. I'm just puzzled by the disappearing messages when the date filter is activated.
Other messages do get through, but only those that do not enter this 'if'
if [syslog_message] and [program] !~ "nginx" and [program] !~ "filebeat"