Logstash filter verifier - any working example?

After attempting to fix the logstash directory I tried again.
./logstash-filter-verifier syslog.json test.config
resulted in :
Running tests in syslog.json...
Error running Logstash: exit status 1.
Process output:
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs to console
23:00:39.073 [main] FATAL logstash.runner - An unexpected error occurred! {:error=>#<ArgumentError: Path "/usr/share/logstash/data" must be a writable directory. It is not writable.>, :backtrace=>["/usr/share/logstash/logstash-core/lib/logstash/settings.rb:420:in validate'", "/usr/share/logstash/logstash-core/lib/logstash/settings.rb:202:invalidate_value'", "/usr/share/logstash/logstash-core/lib/logstash/settings.rb:118:in validate_all'", "org/jruby/RubyHash.java:1342:ineach'", "/usr/share/logstash/logstash-core/lib/logstash/settings.rb:117:in validate_all'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:210:inexecute'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67:in run'", "/usr/share/logstash/logstash-core/lib/logstash/runner.rb:183:inrun'", "/usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:132:in run'", "/usr/share/logstash/lib/bootstrap/environment.rb:71:in(root)'"]}

    The process wrote nothing to its logfile.

Both logstash.yml and the log4j2.properties are both in /etc/logstash.
Since it is asking for write capabilities to /usr/share/logstash/data, does this command have to be run with sudo or is the permissions created by the package installer incorrectly setting the directory permissions?

Logstash 5.2 for Ubuntu 14.04, at least, is in /usr/share/logstash on Linux. I suspect I have to change this in the .go file and then recompile? It seems there is no way to discover this from the environment?

You can use the --logstash-path option to change the path to the Logstash executable. I suppose the default value eventually will be adjusted, but that'll have to happen on a major version flip (so probably 2.0).

Since it is asking for write capabilities to /usr/share/logstash/data, does this command have to be run with sudo or is the permissions created by the package installer incorrectly setting the directory permissions?

sudo shouldn't be necessary. I thought LFV was compatible with 5.2 but I hadn't checked it myself. I'm going to reopen Support for logstash 5.0 · Issue #8 · magnusbaeck/logstash-filter-verifier · GitHub.

sudo did work. However, I am wondering why I get testcase failed when as far as I can tell it is parsing everything - well more than what I was expecting.

For the syslog example, the following is the test case:
{
"fields": {
"type": "syslog"
},
"input": [
"Oct 6 20:55:29 myhost myprogram[31993]: This is a test message"
],
"expected": [
{
"@timestamp": "2015-10-06T20:55:29.000Z",
"host": "myhost",
"message": "This is a test message",
"pid": 31993,
"program": "myprogram",
"type": "syslog"
}
]
}
The filter that I am testing for the syslog is:
filter {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:[@metadata][timestamp]} %{SYSLOGHOST:host} %{DATA:program}(?:[%{POSINT:pid}])?: %{GREEDYDATA:message}" }
}
date { match => [ "[@metadata][timestamp]", "ISO8601", "MMM dd yyyy HH:mm:ss",
"MMM d yyyy HH:mm:ss", "dd/MMM/yyyy:HH:mm:ss Z"]
}
}
and the output received :
sudo ./logstash-filter-verifier syslog.json test-syslog.conf
Running tests in syslog.json...
Comparing message 1 of syslog.json...
--- /tmp/519580950/syslog.json/1/expected 2017-02-28 18:12:37.337758013 +0000
+++ /tmp/519580950/syslog.json/1/actual 2017-02-28 18:12:37.337758013 +0000
@@ -1,8 +1,17 @@
{
- "@timestamp": "2015-10-06T20:55:29.000Z",
- "host": "myhost",
- "message": "This is a test message",
- "pid": 31993,
+ "@timestamp": "2017-02-28T18:12:34.314Z",
+ "host": [
+ "cfa",
+ "myhost"
+ ],
+ "message": [
+ "Oct 6 20:55:29 myhost myprogram[31993]: This is a test message",
+ "This is a test message"
+ ],
+ "pid": "31993",
"program": "myprogram",
+ "tags": [
+ "_dateparsefailure"
+ ],
"type": "syslog"
}
Testcase failed, continuing with the rest: 1 message(s) did not match the expectations.
One or more testcases failed.
From the above it is also parsing something else other than just the testcase. For example - cfa is in the hosts file for localhost, but not in the testcase messsage. The @timestamp was mapped to the log timestamp, yet in the expected and actual it is showing the logstash received timestamps. The tags indicate a dateparsefailure, but the date is correctly transformed to ISO8601 from Syslog timestamp. Are these indications that I need to use overwrite to clean up the result for it to pass?

I cleaned it up by using overwrite:
overwrite => ["message", "host", "pid"]
and the output although cleaner, still fails:
sudo ./logstash-filter-verifier syslog.json test-syslog.conf
Running tests in syslog.json...
Comparing message 1 of syslog.json...
--- /tmp/769194887/syslog.json/1/expected 2017-02-28 18:41:39.178013589 +0000
+++ /tmp/769194887/syslog.json/1/actual 2017-02-28 18:41:39.178013589 +0000
@@ -1,8 +1,11 @@
{
- "@timestamp": "2015-10-06T20:55:29.000Z",
+ "@timestamp": "2017-02-28T18:41:36.211Z",
"host": "myhost",
"message": "This is a test message",
- "pid": 31993,
+ "pid": "31993",
"program": "myprogram",
+ "tags": [
+ "_dateparsefailure"
+ ],
"type": "syslog"
}
Testcase failed, continuing with the rest: 1 message(s) did not match the expectations.
One or more testcases failed.
I am not sure why the @timestamp continues to fail as I thought that if I pushed it to [@metadata][timestamp] (https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html) it would be (and seems to be) saved as the @timestamp. Is there any way around this issue? i.e. performing analysis on any historical log will always result in a failure from this result.

Is it possible that the "_dateparsefailure" is occurring because the year isn't included and that in this case it would represent a future event? I wouldn't expect that to be an issue for a parsing engine, but could it be possible?

The pid in string format is more difficult. It is obviously parsing it per the filter, however, even casting it to 'int' using %{SYSLOGPROG:pid:int} still results in a string replacing the int value in the expected result.

Doh. I found the issue with the "_dateparsefailure" - I blindly copied the text from the date filter where it said syslog without looking that it actually matched the example. The date filter had "year" information. Once corrected it did fix the "_dateparsefailure".

I still don't understand why I can't force the pid to int, nor why the timestamp still gets replaced with the received at timestamp.

new filter:
filter {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:[@metadata][timestamp]} %{SYSLOGHOST:host} %{SYSLOGPROG:pid:int}: %{GREEDYDATA:message}" }
overwrite => ["message", "host", "pid"]
}
date { match => [ "[@metadata][timestamp]", "MMM d HH:mm:ss", "MMM dd HH:mm:ss", "MMM dd yyyy HH:mm:ss",
"MMM d yyyy HH:mm:ss", "dd/MMM/yyyy:HH:mm:ss Z", "ISO8601"]
remove_field => "timestamp"
}
}

same syslog test case:
{
"fields": {
"type": "syslog"
},
"input": [
"Oct 6 20:55:29 myhost myprogram[31993]: This is a test message"
],
"expected": [
{
"@timestamp": "2015-10-06T20:55:29.000Z",
"host": "myhost",
"message": "This is a test message",
"pid": 31993,
"program": "myprogram",
"type": "syslog"
}
]
}

result:
sudo ./logstash-filter-verifier syslog.json test-syslog.conf
Running tests in syslog.json...
Comparing message 1 of syslog.json...
--- /tmp/391551220/syslog.json/1/expected 2017-03-01 01:28:17.712330949 +0000
+++ /tmp/391551220/syslog.json/1/actual 2017-03-01 01:28:17.712330949 +0000
@@ -1,8 +1,8 @@
{
- "@timestamp": "2015-10-06T20:55:29.000Z",
+ "@timestamp": "2017-10-06T20:55:29.000Z",
"host": "myhost",
"message": "This is a test message",
- "pid": 31993,
+ "pid": "31993",
"program": "myprogram",
"type": "syslog"
}
Testcase failed, continuing with the rest: 1 message(s) did not match the expectations.
One or more testcases failed.

Too quick on the date issue. I just realized that it did exactly what I said it would do - replace the missing year with the present year, which is why it didn't match everything - but it did use the date that was in the message.

So the only remaining item for me now is why I can't force things that look like numbers and are parsed as numbers to be represented as numbers? Or is that a problem with the example data in that even numbers should be quoted?

So the only remaining item for me now is why I can't force things that look like numbers and are parsed as numbers to be represented as numbers? Or is that a problem with the example data in that even numbers should be quoted?

This is because the SYSLOGPROG pattern is defined like this:

That is, the pattern matches both the program name and the pid, and doesn't use :int to coerce the pid to an integer, and that's nothing you can correct by adding :int for the SYSLOGPROG pattern (even the attempt to capture into the pid field is futile, i.e. the only reasonable way to use SYSLOGPROG is plain %{SYSLOGPROG}). Two options:

  • Use %{PROG:program}(?:\[%{POSINT:pid:int}\])? in your grok expression instead of %{SYSLOGPROG}.
  • Use a mutate filter to convert the pid field to an integer.

Thanks Magnus.