Logstash filter to drop data

good morning...new user to be gentle if I don't quite say thing correctly :slight_smile:

My logstash is V 2.2.2, running on CentOS 7.3.

Part of the data logstash handles is from a Progress RDBMS database log file. In that file there is a line added every time the database starts/stops, with the format:

" EEE MMM dd HH:mm:ss yyyy" as in " Wed Oct 11 16:33:22 2017"

We do not need to send these to logstash.

We see constant errors in logstash.log:

{:timestamp=>"2017-11-01T09:00:52.334000-0700", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>"%{datetimein]0]}", :exception=>"Invalid format: "%{datetimein]0]}"", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-01T09:00:52.335000-0700", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>"%{datetimein]0]}", :exception=>"Invalid format: "%{datetimein]0]}"", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-01T09:00:52.336000-0700", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>" Wed Nov 1 09:00:42 2017", :exception=>"Invalid format: " Wed Nov 1 09:00..."", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-01T09:00:52.336000-0700", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>" Wed Nov 1 09:00:42 2017", :exception=>"Invalid format: " Wed Nov 1 09:00..."", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-01T09:00:52.338000-0700", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>"%{datetimein]0]}", :exception=>"Invalid format: "%{datetimein]0]}"", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-01T09:00:52.339000-0700", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>"%{datetimein]0]}", :exception=>"Invalid format: "%{datetimein]0]}"", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-01T09:00:52.339000-0700", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>" Wed Nov 1 09:00:42 2017", :exception=>"Invalid format: " Wed Nov 1 09:00..."", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}

I am certain these are due to the data/time format of the aforementioned database log file lines parsed.

I found a reference to creating a filter, which I did as follows:

    filter {
    if [datetimein] == "                EEE MMM dd HH:mm:ss yyyy" {
                drop { }
            }
    }

configtest shows:

logstash --configtest -f logstash.conf.NEW
Error: Expected one of #, => at line 74, column 12 (byte 2072) after filter {

which is this filter, so it's obvious I have an issue with this configuration. Being a novice I'm wondering if any of the experts here can offer a solution.

Thanks!

"EEE MMM ..." is a date pattern that you can't use in this context. You need a regular expression.

if [datetimein] =~ /^\w+ \w+ \d{1,2} \d{2}:\d{2}:\d{2} \d{4}$/ {

Untested but should be reasonably close.

Thanks Magnus,

I've tried this as shown:

if [fields][logtype] == "dblog" {
uuid { target => "correlation_id" }
mutate {
replace => { application => "dblog" }
replace => { type => "log" }
add_field => { "data_version" => "1" }
add_field => { "hostname" => "%{[beat][hostname]}" }
add_field => { "datetimein" => "%{message}"}
add_field => { "datetimein2" => ""}
add_field => { "description" => "%{message}"}
}
mutate {
split => ["datetimein", "]"]
}
mutate {
replace => [ "datetimein", "%{[datetimein][0]}" ]
}
mutate { gsub => [ "datetimein", "[", "" ] }
date { match => [ "datetimein", "yyyy/MM/dd@HH:mm:ss.SSSZ" ]
target => "datetimein2"
timezone => "UTC"
}
date { match => [ "datetimein", "yyyy/MM/dd@HH:mm:ss.SSSZ" ]
target => "@timestamp"
timezone => "UTC"
}
grok {
match => { "source" => "/travel/91/env/%{DATA:[tavs_data][entityid]}/db/%{DATA:[tavs_data][tavsid]}/" }
}
# filter to ignore/drop records from DB log showing date change of any type
# record content always like this " EEE MMM dd HH:mm:ss yyyy" {
# EXAMPLE " Wed Nov 1 00:00:01 2017"
filter {
if [datetimein] =~ /^\s{16}\w+ \w+ \d{1,2} \d{2}:\d{2}:\d{2} \d{4}$/ {
drop { }
}
}
}
but still get the error

Error: Expected one of #, => at line 79, column 12 (byte 2391) after filter {

I feel like I have omitted something. Saw another post referencing specifying a field to match but don't quite understand that.

You have a filter section within a filter section and that's not allowed.

filter {
if [datetimein] =~ /^\s{16}\w+ \w+ \d{1,2} \d{2}:\d{2}:\d{2} \d{4}$/ {
drop { }
}

Remove filter { and the corresponding }.

Thanks Magnus....that resolved the config error. However, I am still seeing these errors in my logstash.log file:

{:timestamp=>"2017-11-06T06:40:44.956000-0800", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>" Mon Nov 6 06:40:33 2017", :exception=>"Invalid format: " Mon Nov 6 06:40..."", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-06T06:40:44.957000-0800", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>" Mon Nov 6 06:40:33 2017", :exception=>"Invalid format: " Mon Nov 6 06:40..."", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-06T06:40:44.960000-0800", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>"%{datetimein]0]}", :exception=>"Invalid format: "%{datetimein]0]}"", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-06T06:40:44.960000-0800", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>"%{datetimein]0]}", :exception=>"Invalid format: "%{datetimein]0]}"", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-06T06:40:44.962000-0800", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>" Mon Nov 6 06:40:33 2017", :exception=>"Invalid format: " Mon Nov 6 06:40..."", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-06T06:40:44.962000-0800", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>" Mon Nov 6 06:40:33 2017", :exception=>"Invalid format: " Mon Nov 6 06:40..."", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-06T06:40:44.965000-0800", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>"%{datetimein]0]}", :exception=>"Invalid format: "%{datetimein]0]}"", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-06T06:40:44.966000-0800", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>"%{datetimein]0]}", :exception=>"Invalid format: "%{datetimein]0]}"", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-06T06:40:44.967000-0800", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>" Mon Nov 6 06:40:33 2017", :exception=>"Invalid format: " Mon Nov 6 06:40..."", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}
{:timestamp=>"2017-11-06T06:40:44.968000-0800", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>" Mon Nov 6 06:40:33 2017", :exception=>"Invalid format: " Mon Nov 6 06:40..."", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}

Perhaps the location in the config file is not allowing the 'if' statement to be properly executed?

{:timestamp=>"2017-11-06T06:40:44.956000-0800", :message=>"Failed parsing date from field", :field=>"datetimein", :value=>" Mon Nov 6 06:40:33 2017", :exception=>"Invalid format: " Mon Nov 6 06:40..."", :config_parsers=>"yyyy/MM/dd@HH:mm:ss.SSSZ", :config_locale=>"default=en_US", :level=>:warn}

It looks like the datetimein field contains " Mon Nov 6 06:40:33 2017" (with a leading space) but you've asked the date filter to parse it with the pattern "yyyy/MM/dd@HH:mm:ss.SSSZ".

the data record includes 16 spaces before the Day of the week.

Here is the if statement:

if [datetimein] =~ /^\s{16}\w+ \w+ \d{1,2} \d{2}:\d{2}:\d{2} \d{4}$/ {
drop{}
}

guessing the regex is a little off. Should I remove "~" - you had it on your example so I went with it.

The tilde needs to be there, otherwise it won't perform a regexp match.

yeah I found that out.

So the question is, if my format correct?

if [datetimein] =~ /^\s{16}\w+ \w+ \d{1,2} \d{2}:\d{2}:\d{2} \d{4}$/ {
drop { }
}

each of these is in the log in this manner:

" Wed Nov 1 00:00:01 2017" (16 leading spaces)

am I correct that \w+ \w+ is capturing "Wed Nov" as the day and month as I think it is?

Yes.

If you format the line you want to match as preformatted text we'll see exactly what it looks like. I suspect you actually have two spaces before the day of the month and the regexp doesn't take that into account.

I copied the exact line from the log file we're parsing, and it's format is

            Wed Nov  1 00:00:02 2017 - that's exactly 16 spaced prior to "Wed" - there's nothing else on that line, but I did miss the space between the day of the month and the time and have added that in.  Do I need to explicitly account for each space on the record also (whitespace)?

Indeed, there are two spaces before the day of the month. Replace the single space prior to \d{1,2} with \s{1,2}.

forgive me, but there has to be two spaces when the day of the month is a single digit, and one space when the day of the month is 2 digits. Doesn't \d{1,2} mean 1 or 2 digits?

would \s+ be more appropriate prior to \d{1,2} ?

unfortunately the day of the month will be either 1 or 2 digits and this is not configurable since it's a database application that writes it out to the log.

Doesn't \d{1,2} mean 1 or 2 digits?

Yes. And \s{1,2} means 1 or 2 whitespace characters.

would \s+ be more appropriate prior to \d{1,2} ?

That works too. \s{1,2} is obviously slightly stricter than \s+.

This did not work:

if [datetimein] =~ /^\s{16}\w+\s\w+\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s\d{4}$/ {
drop { }
}

thoughts?

Nothing that stands out, but I still haven't seen exactly what datetimein looks like. Please use a stdout { codec => rubydebug } output and show the results as preformatted text.

" Wed Oct 25 00:00:01 2017" (16 leading spaces, and without the double quotes)
" Wed Nov 1 00:00:01 2017" (16 leading spaces, and without the double quotes)

is the proper format from the database log. That is how the software writes it to the log.

I'll get the output you requested in the morning. Fighting release issues :frowning:

I did enable rubydebug and here is what happens there - note [1] "_dateparsefailure",:

{
"message" => "",
"@version" => "1",
"@timestamp" => "2017-11-07T18:37:09.825Z",
"beat" => {
"hostname" => "XXXXXXXX.XXXXXXXX.XXX",
"name" => "XXXXXXXX.XXXXXXXX.XXX"
},
"count" => 1,
"fields" => {
"logtype" => "dblog"
},
"input_type" => "log",
"offset" => 159639634,
"source" => "/travel/91/env/tf000112/db/sesdb/sesdb.lg",
"type" => "log",
"host" => "XXXXXXXX.XXXXXXXX.XXX",
"tags" => [
[0] "beats_input_codec_plain_applied",
[1] "_dateparsefailure",
[2] "pii"
],
"correlation_id" => "c314bb71-bdbd-45f5-b7d5-1ccec432da75",
"application" => "dblog",
"data_version" => "1",
"hostname" => "XXXXXXXX.XXXXXXXX.XXX",
"datetimein" => "%{datetimein]0]}",
"datetimein2" => "",
"description" => "",
"tavs_data" => {
"entityid" => "tf000112",
"tavsid" => "sesdb"
}
}

Here is the relevant data from the database log when the database shuts down and restarts - the date line is obvious:

[2017/11/07@10:36:14.037-0800] P-23260 T-140688437077760 I SHUT 25: (542) Server shutdown started by progress on batch.
[2017/11/07@10:36:14.037-0800] P-5942 T-139910365058816 I BROKER 0: (15193) The normal shutdown of the database will continue for 10 Min 0 Sec if required.
[2017/11/07@10:36:14.037-0800] P-5942 T-139910365058816 I BROKER 0: (2248) Begin normal shutdown
.
.
.
[2017/11/07@10:36:14.038-0800] P-9509 T-140012267312896 I SRV 2: (2520) Stopped.
[2017/11/07@10:36:19.042-0800] P-5942 T-139910365058816 I BROKER : (334) Multi-user session end.
[2017/11/07@10:36:20.038-0800] P-23260 T-140688437077760 I SHUT 25: (453) Logout by progress on batch.

            Tue Nov  7 10:37:15 2017

[2017/11/07@10:37:15.584-0800] P-24423 T-139816814819072 I BROKER 0: (333) Multi-user session begin.
[2017/11/07@10:37:15.585-0800] P-24423 T-139816814819072 I BROKER 0: (15321) Before Image Log Initialization at block 0 offset 0.
[2017/11/07@10:37:15.728-0800] P-24423 T-139816814819072 I BROKER 0: (452) Login by progress on batch.
[2017/11/07@10:37:15.730-0800] P-24423 T-139816814819072 I BROKER 0: (5644) Started for sesdb-tf000115 using TCP IPV4 address 0.0.0.0, pid 24423.
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (4234) Progress OpenEdge Release 11.5 build 1191 SP01 TF33 on Linux XXXXXXXXXXXXXX.XXXXXXXXX.XXX 2.6.32-696.el6.x86_64 #1 SMP Tue Feb 21 00:53:17 EST 2017 x86_64.
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (4281) Server started by progress on batch.
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (6574) Started using pid: 24423.
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (9426) Large database file access has been enabled.
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (15219) Encryption enabled: 0
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (15824) Multi-tenancy enabled: 0
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (15824) Table Partitioning enabled: 0
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (-----) LRU mechanism enabled.
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (4282) Parameter File: /travel/91/env/tf000115/conf/sesdb.pf.
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (9336) Created shared memory with segment_id: 1474569
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (4250) Before-Image Cluster Size: 524288.
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (4251) Before-Image Block Size: 8192.
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (13873) After-image Management Archival Directory List (-aiarcdir): Not Enabled
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (13874) Create After-image Management Archival Directory(s) (-aiarcdircreate): Not Enabled
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (13872) After-image Management Archival Interval (-aiarcinterval): -1
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (4256) Number of After-Image Buffers (-aibufs): 20
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (4254) After-Image Stall (-aistall): Not Enabled
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (17555) Starting index number for statistics range (-baseindex): 1
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (17554) Starting table number for statistics range (-basetable): 1
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (4252) Number of Before-Image Buffers (-bibufs): 20
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (6552) BI File Threshold Stall (-bistall): Disabled.
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (9238) BI File Threshold size (-bithold): 0.0 Bytes
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (6573) Database Blocksize (-blocksize): 8192
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (12812) BIW writer delay (-bwdelay): 0
[2017/11/07@10:37:16.730-0800] P-24423 T-139816814819072 I BROKER 0: (12813) Allowed index cursors (-c): 600
[2017/11/07@10:37:16.731-0800] P-24423 T-139816814819072 I BROKER 0: (12265) SSL Certificate Store Path (-certstorepath): Not Enabled
[2017/11/07@10:37:16.731-0800] P-24423 T-139816814819072 I BROKER 0: (4264) Character Set (-cpinternal): ISO8859-1
[2017/11/07@10:37:16.731-0800] P-24423 T-139816814819072 I BROKER 0: (4235) Physical Database Name (-db): /travel/91/env/tf000115/db/sesdb/sesdb
[2017/11/07@10:37:16.731-0800] P-24423 T-139816814819072 I BROKER 0: (4238) Direct I/O (-directio): Not Enabled
[2017/11/07@10:37:16.731-0800] P-24423 T-139816814819072 I BROKER 0: (4236) Database Type (-dt): PROGRESS

Please format all logs as preformatted text so the contents doesn't get mangled.

"datetimein" => "%{datetimein]0]}",

Is this the datetimein value you're comparing against the regexp?

Here is the entire section that addresses the progress database log:

progress db logfile

if [fields][logtype] == "dblog" {
uuid { target => "correlation_id" }
mutate {
replace => { application => "dblog" }
replace => { type => "log" }
add_field => { "data_version" => "1" }
add_field => { "hostname" => "%{[beat][hostname]}" }
add_field => { "datetimein" => "%{message}"}
add_field => { "datetimein2" => ""}
add_field => { "description" => "%{message}"}
}
mutate {
split => ["datetimein", "]"]
}
mutate {
replace => [ "datetimein", "%{[datetimein][0]}" ]
}
mutate { gsub => [ "datetimein", "[", "" ] }
date { match => [ "datetimein", "yyyy/MM/dd@HH:mm:ss.SSSZ" ]
target => "datetimein2"
timezone => "UTC"
}
date { match => [ "datetimein", "yyyy/MM/dd@HH:mm:ss.SSSZ" ]
target => "@timestamp"
timezone => "UTC"
}
grok {
match => { "source" => "/travel/91/env/%{DATA:[tavs_data][entityid]}/db/%{DATA:[tavs_data][tavsid]}/" }
}
# filter to ignore/drop records from DB log showing date change of any type
# record content always like this " EEE MMM dd HH:mm:ss yyyy" {
# EXAMPLE " Wed Nov 1 00:00:01 2017"
if [datetimein] =~ /^\s{16}\w+\s\w+\s+\d{1,2}\s\d{2}:\d{2}:\d{2}\s\d{4}$/ {
drop { }
}
}

I wish I knew what

mutate {
  replace => [ "datetimein", "%{[datetimein][0]}" ]
}

actually did...previous guy who supported logstash left me high and dry :frowning: