Logstash-filter-verifier failing on timestamp


(Kathleen DeRusso) #1

Hi, I've started looking at the Logstash filter verifier, and I was wondering if someone could point me in the direction of what I'm doing wrong.

I have a date filter in my Logstash configuration file that looks like this:

date {
    match => [ "timestamp", "ISO8601" ]
    timezone => "UTC" 
    target => "@timestamp"
 }

If I pass in a log entry that looks like this:
2018-10-30 14:51:48,116

I parse it correctly (shown is the Rubydebug output):
"@timestamp" => 2018-10-30T14:51:48.116Z,

So this all works as I would expect.

Now, when I try to run the logstash filter verifier, I have created a conf file that looks like this:

{
  "fields": {
    <snip> 
  },
  "input": [
    "2018-10-26 09:25:42,219 <restOfMyMessage>"
  ],
  "ignore": ["host"],
  "expected": [
    {
      "@timestamp": "2018-10-26 09:25:42.219",
      <snip> 
    }
  ]
}

When I run the filter verifier, I get this output:

Running tests in test.json...
Comparing message 1 of 1 from test.json...
--- /tmp/836774239/test.json/1/expected  2018-11-01 08:45:19.800311458 -0400
+++ /tmp/836774239/test.json/1/actual    2018-11-01 08:45:19.798311434 -0400
@@ -1,7 +1,5 @@
{
-  "@timestamp": "2018-10-26 09:25:42.219",
+  "@timestamp": "2018-11-01T12:45:17.467Z",
  <snip> 
 }
Testcase failed, continuing with the rest: 1 message(s) did not match the expectations.
one or more testcases failed

It feels like I'm making a very simple mistake here, but I can't figure out why the date for the same configuration file and input would be different when outputting to rubydebug vs. using the logstash-filter-verifier.

I'm using the latest logstash-filter-verifier (1.5.0) and Logstash is 6.4.1. This behavior occurs regardless of whether or not I use the --logstash-version command line parameter.

Has anyone got this working and can provide some advice? Thanks in advance.


(Mike Barretta) #2

@kathleen
It looks like the @timestamp is not being overridden by your pipeline and is instead using the default value of now, which is then converted to UTC from EDT (+4). If the right value is coming out of Logstash, then maybe there's something funky with the logstash-filter-verifier. It's not an Elastic product, so don't too much about it. Maybe post this question on the issue page on github?


(Kathleen DeRusso) #3

Thanks, I started here because I've seen other posts about this third party software. Trying the github page as well.


(Mike Barretta) #4

@kathleen
One point of advice I'd give is to not override @timestamp in the first place, but to instead create your own date field. @timestamp is really a Logstash metadata field that records when the event was handled: it could be beneficial to know the diff between when the event was created and when it was handled.


(Kathleen DeRusso) #5

@Mike.Barretta thanks for the advice. We hadn’t considered it, but that’s a good idea.


(Wyatt) #6

Hey Mike. I just took your advice and reset some of our logs while keeping the @timestamp as the logstash processing time and creating request_timestamp as the true time the logs were created.

After processing, I started to dig into the some of our Kibana visualizations and dashboards. Naturally, we had to update the time fields for time-explicit graphs. However, limitations started to emerge, especially in the dashboards:

  1. Auto sync does not work intuitively. Before, when everything was based off @timestamp, we could set an auto interval in the visualizations in a dashboard, and all the visualizations would automatically sync, stay coherent, and scale based on the time span specified in the top right.
  2. The easy-switch time scale is quite limited. Continuing off of the problem mentioned above, whenever we would change the time span in the top right, none of the request_timestamp-based visualizations would update. To get something like the auto time scaling, we have to explicitly set a lot of filters.
  3. Time-implicit charts are subtle and misleading. Perhaps the trickiest thing found was that visualizations that had their time implied (like pie charts) would not update or change like the request_timestamp. All of those visualizations would lose a lot of their abilities since their timestamp would be roughly the day they were processed in logstash, not their true time (which spans years)

With these points, we are considering doing sort of the inverse of what was mentioned where @timestamp is the true time the log was created, and (e.g.) elastic_timestamp is when Logstash actually processed the timestamp. If we do this, it seems like we have the best of both worlds, the power and ease-of-use of syncing and fast timescale changes with Kibana as well as tracking the logstash time. What are your opinions with inverting this? Do you see any possible limitations, or anything that we missed?

We are looking forward to hearing your advice on this, and thank you in advance! I hope everything was explained clear enough.

Note: We are aware that, in the future, the two different timestamps converge. However, this does not do much for our historical data.


(Mike Barretta) #7

@wyatt
I think the problem is that your index pattern in Kibana is still using the @timestamp as the date field. You can verify that by viewing the pattern in Kibana and seeing if it still has the little clock icon next to it (at least it has one in 6.4...can't remember if it was there in earlier versions):

If you delete in index pattern and create it again, this time selecting request_timestamp as the date field, all the timepicker stuff should work as expected.


(Wyatt) #8

That's really convenient, I had no idea about it. Thank you!


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.