Yet another @timestamp log file trouble... but it should be easy!

Hi again forum!

I never thoug it could be that hard... but I'm unable to properly handle log file acquired data... because a date problem.
Google is full of entries around, but none solved my problem, and I'm affraid I'm complacting things too much, making things worse.

My problem is that I hace a log file, quite simple in its date format, and surprisingly too short I think:
.... 2015-12-31 00:00:00 ....
so, I fugure a pattern YYYY-MM-dd HH:mm:ss

My grok filter apparently matches all log fields quite good... including those date fields, I do like this:

.... %{YEAR:sc_year}-%{MONTHNUM:sc_month}-%{MONTHDAY:sc_day} %{TIME:sc_time} ...

And I got my nice sc_year, sc_month, sc_day and sc_time correctly...

Now, the problem, obviously comes as I try to get the @timestamp field with my log read values, instead of the default @tiemstamp values that include the capture time data.
I have struggled with tons of combinations, leading to no success...

So, I post here my last setup (just the last, not probably the best, I'm getting lost...), hoping that someone could point me in the right direction.
grok match line is cut, since it was too long, I pasted just the relevant portion

....
grok {
match => [ "message", " ... %{YEAR:sc_year}-%{MONTHNUM:sc_month}-%{MONTHDAY:sc_day} %{TIME:sc_time} .... " ]
}

mutate {
  add_field => { "mytimestamp" => "%{sc_year}-%{sc_month}-%{sc_day} %{sc_time}.000+02:00" }
}

date {
  locale => "en"
  match => [ "mytimestamp", "YYYY-MM-dd HH:mm:ss,SSSz" ]
  timezone => "Europe/Madrid"
  target => "@timestamp"
  add_tag => [ "tsmatch" ]
}

....

Could you give me some clue?

Thank you in advance, best regards!

If the date filter fails it will tell you why via the Logstash log.

I suspect the problem is with the "z" token in the date match pattern. As stated in the Joda-Time documentation, "z" is the timezone name and can't be parsed, and even if it could it wouldn't work since your mytimestamp field contains a timezone offset. You'll have to use "Z" for that.

However, since you explicitly set the timezone for the date filter you don't have to include the offset in the field you're parsing. Also, if the system Logstash runs on has the same timezone as the system producing the log you can just rely on the default.

Finally, instead of parsing the timestamp into separate fields that you join afterwards, why not parse them into that field from the start?

Untested suggestion:

filter {
  grok {
    match => [
      "message",
      " ... (?<mytimestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) .... "
    ]
  }
  date {
    match => [ "mytimestamp", "YYYY-MM-dd HH:mm:ss,SSS" ]
    locale => "en"
    add_tag => [ "tsmatch" ]
  }
}

Thank you for your prompt help...

To be sincere, why I'm using the intermediate mytimestamp field is because of testing ant testing again and again...I did'n tried something so complicated at the very beginning!

The base problem is I do not have found the essential logic behind @timestamp cleary explained or documented... Its syntax and mandatory / optional stuff.
I heve being testing with facts, and the only thing I've been sure is kibana histograms refuse all my "custom time" fields, and only "accepts" original @timestamp (obviously to me it has the right format :smile: ), so, I have tried a lot of config with .SSS , ,SSS ,z ,Z and all its combinations in adition to "mytimestamp" match setups.

Efectively, you're right... my logstash logs were full of warns, with either too short, bad format, complains.
Tomorrow I will try your approach... it looks fresh clean!

Also, it is important to me to consolidate knowledge... so, it seems I can learn @timestamp has to include ss,SSS... but my original log does not have ss or SSS info! from where are those values finally taken? ...obviously we are declaring them.
Also, I do not understand what happened to z or Z stuff... are we ignoring it? or it is implicity computing from the locale setting or whatever magic?

Thank you very much for your help!!!

Also, it is important to me to consolidate knowledge... so, it seems I can learn @timestamp has to include ss,SSS... but my original log does not have ss or SSS info! from where are those values finally taken? ...obviously we are declaring them.

If seconds or milliseconds aren't included in the timestamps from the logs (i.e. you don't have ss or SSS tokens in your date matching pattern) they will default to zero.

Also, I do not understand what happened to z or Z stuff... are we ignoring it? or it is implicity computing from the locale setting or whatever magic?

I guess you could call it magic. The timestamp that's parsed is treated as local time in the current system's locale and then turned into UTC for storing in the @timestamp field. You should obviously only rely on this if you know that the original log was produced in the same timezone.

...mmm... missed shot

I altered my current filter setup with your suggested approach:
The result is that @timestamp keeps its original, file read time.

I XDELETED all old data... and restarted logstash in order to make it read the file from the beginning again and make it crystal clear to have every shot easy to analyze... and still no luck... there is clearly date mismatch.

Seems all this is far more complex than I thought!

I XDELETED all old data... and restarted logstash in order to make it read the file from the beginning again

"XDELETED", what do you mean?

Here's a complete working example:

$ cat test.config
input { stdin { codec => plain } }
output { stdout { codec => rubydebug } }

filter {
  grok {
    match => [
      "message",
      "(?<mytimestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME})"
    ]
  }
  date {
    match => [ "mytimestamp", "YYYY-MM-dd HH:mm:ss,SSS" ]
    locale => "en"
    add_tag => [ "tsmatch" ]
  }
}
$ cat data
2015-01-02 12:13:14,123
$ /opt/logstash/bin/logstash -f test.config < data
{
        "message" => "2015-01-02 12:13:14,123",
       "@version" => "1",
     "@timestamp" => "2015-01-02T11:13:14.123Z",
           "host" => "name-of-host",
    "mytimestamp" => "2015-01-02 12:13:14,123",
           "tags" => [
        [0] "tsmatch"
    ]
}

Verify that it works for you too, then proceed with adapting it to what your logs really look like.

There was plenty of crap due to ingesting the file data over and over... I know I read there are mechanisms to avoid log entry duplication, but there was either duplicate data or messed up entries...

so I cleared elasticsearch entries and start from fresh empty database:
To do this i issued a curl command from CLI.
and as I read from the Internet posts, blogs, documentation it is done with the XDELETE parameter, followed with our elasticsearch query...
It worked, and I cleared old crap entries.

I restarted logstash afterwards and let it ingest again my log field, using your modifications... but somehow it didn't worked.

I will post my full filter...

This is my current filter.... Its my first one... I'm sure it could be better :smile:
It is being applied, since log entries on test file are caught and I can see them.... they got its GeoIP stuff... almost ready! but @timestamp is mismatching

filter {
if "Shoutcast" in [tags] {
grok {
match => [ "message", "%{IP:src_ip} %{IP:src_dns} (?%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) %{NOTSPACE:stream} %{NUMBER:c_reply} %{NOTSPACE:user_agent} %{NUMBER:sc_bytes} %{NUMBER:x_duration} %{NUMBER:avgbandwidt} %{NOTSPACE:c_query}" ]
}
mutate {
gsub => [
"stream", "/stream?title=", "",
"stream", "%20", " ",
"stream", "%2D", "-",
"stream", "%21", "!",
"stream", "%28", "(",
"stream", "%29", ")",
"stream", "%2B", "+",
"stream", "%2F", "/",
"stream", "%2E", ".",
"stream", "%2C", ",",
"stream", "%3A", ":",
"stream", "%40", "@",
"stream", "%5F", "",
"user_agent", "%20", " ",
"user_agent", "%2D", "-",
"user_agent", "%21", "!",
"user_agent", "%28", "(",
"user_agent", "%29", ")",
"user_agent", "%2B", "+",
"user_agent", "%2F", "/",
"user_agent", "%2E", ".",
"user_agent", "%2C", ",",
"user_agent", "%3A", ":",
"user_agent", "%40", "@",
"user_agent", "%5F", "
" ]
}

date {
match => [ "mytimestamp", "YYYY-MM-dd HH:mm:ss,SSS" ]
locale => "en"
add_tag => [ "tsmatch" ]
}

mutate {
  remove_field => [ "message" ]
  remove_tag => [ "_grokparsefailure" ]
}


if [src_ip]  {
  geoip {
    source => "src_ip"
    target => "geoip"
    database => "/opt/logstash/vendor/geoip/GeoLiteCity.dat"
    add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
    add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
  }
  mutate {
    convert => [ "[geoip][coordinates]", "float" ]
  }
  if ![geoip.ip] {
    if [dest_ip]  {
      geoip {
        source => "dest_ip"
        target => "geoip"
        database => "/opt/logstash/vendor/geoip/GeoLiteCity.dat"
        add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
        add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
      }	
      mutate {
        convert => [ "[geoip][coordinates]", "float" ]
      }
    }
  }
}

}
}

I will post a query to see how it looks in my machine,,,

Here is a query result excerpt

{
"_index" : "logstash-2015.05.13",
"_type" : "STRtesting",
"_id" : "LCqk5l2jTaKsM1-cBF7aHg",
"_score" : 3.8455932,
"_source":{"@version":"1","@timestamp":"2015-05-13T09:46:35.458Z","type":"STRtesting","host":"graphs","path":"/var/log/shoutcast/testingl/sc_w3c.log","tags":["Shoutcast","testing"],"src_ip":"213.188.54.252","src_dns":"213.188.54.252","mytimestamp":"2015-05-13 00:40:23","stream":"PACHO %26 PEPO - CLONING SOUND - FROM 00:00 TO 1:00 CET","c_reply":"200","user_agent":"VLC/2.1.6 LibVLC/2.1.6","sc_bytes":"30900224","x_duration":"94241","avgbandwidt":"2616","c_query":"GET","geoip":{"ip":"213.188.54.252","country_code2":"CH","country_code3":"CHE","country_name":"Switzerland","continent_code":"EU","region_name":"15","city_name":"Saint Gallen","latitude":47.425399999999996,"longitude":9.377600000000001,"timezone":"Europe/Zurich","real_region_name":"Sankt Gallen","location":[9.377600000000001,47.425399999999996],"coordinates":[9.377600000000001,47.425399999999996]}}
}

As it can be seen @timestam and mytimestam do differ...
Moreover, my timestam does not appear to have .SSS information... so maybe the date is not matching and therefore no modification is being made.

Where's your date filter?

Unrelated to your problem you should look into using the urldecode filter for your stream and user_agent fields instead of the current gsub thing.

...uh... where it is?... this question makes me unsure...

The only "thing" about date "filter" is the piece of config between mutate-gsub and mutate-remove_fields...

I will edit and bold it...

Thank you very much for your help!

Thank you very much for your suggestion...

Now I used the urldecodefeature and my filter appears quite more clear

...
mutate {
gsub => [
"stream", "/stream?title=", ""
]
}

urldecode {
    field => "stream"
}
urldecode {
    field => "user_agent"
}

...

Thank you!

OK... allways with your date filter setup in place... this is what logstash logs show... it may be interesting for experts...

{:timestamp=>"2015-05-13T13:36:06.260000+0200", :message=>"Failed parsing date from field", :field=>"mytimestamp", :value=>"2015-05-13 12:28:27", :exception=>java.lang.IllegalArgumentException: Invalid format: "2015-05-13 12:28:27" is too short, :level=>:warn}
{:timestamp=>"2015-05-13T13:36:06.267000+0200", :message=>"Failed parsing date from field", :field=>"mytimestamp", :value=>"2015-05-13 12:28:36", :exception=>java.lang.IllegalArgumentException: Invalid format: "2015-05-13 12:28:36" is too short, :level=>:warn}
{:timestamp=>"2015-05-13T13:36:06.269000+0200", :message=>"Failed parsing date from field", :field=>"mytimestamp", :value=>"2015-05-13 12:28:36", :exception=>java.lang.IllegalArgumentException: Invalid format: "2015-05-13 12:28:36" is too short, :level=>:warn}

seems the filter lacks data in order to apply... isn'it?

Best regards!

So, your mytimestamp field looks like this:

2015-05-13 00:40:23

And here's your date filter:

date {
  match => [ "mytimestamp", "YYYY-MM-dd HH:mm:ss,SSS" ]
  locale => "en"
  add_tag => [ "tsmatch" ]
}

Your date pattern includes milliseconds but your mytimestamp field only has millisecond resolution. Drop ",SSS" from the pattern.

Hi.... I tried that way yesterday... trying to "match" it, but to no avail...

I have abandoned the log file reading approach... instead, I'm trying with logstash-forwarder.
The idea behind reading from logs was because the current server is currently using AWSTATS and receives periodic copies of logfiles... so I thought I could reuse that work, but I have to admit that I feel this is not gonna work... it should, but it does not....so:
One more logstash @timestamp related, not solved, thread to the google searchers! :smile:

I feel the project turns very very hostile once you try to do something different than apache, syslog, suricata, firewall... scenario.
So now I will struggle with logstash-forwarder (I feel it will not be easy)

Thank you very much to share your knowledge!

Hi.... I tried that way yesterday... trying to "match" it, but to no avail...

If you show us what you get we might be able to help. An "it doesn't work" statement isn't useful.

I have abandoned the log file reading approach... instead, I'm trying with logstash-forwarder.

You'll still want to parse timestamps and logstash-forwarder doesn't even do that. It's basically equivalent to Logstash's file input plugin.

Great! Why not! I will redo my experimentsand post results.
Basically, yesterday I was "chasing" the format basing on log results.
I tried with YYYY-MM-dd HH:mm:ss YYYY-MM-dd HH:mm:ss.SSS and so on

I remeber trying all combinations, but maybe I didnt or maybe I was fooled by the presence of old, unmatched entries...
Now that I have learn how to clear all entryes from the CLI I will save a log file and will try tomorrow once again.

You're right on the convenience of resolving the @timestamp mismatch... but the impact on accuracy is far minor when using logstash forwarder in my scenario... This is because log files are updated by a cron job from service servers to the AWSTATS server at noticeable intervals.
Logstash forwarder @timestamps will be closer to the log timestamp as they are sended to the server every few seconds... BUt at the end, I agree with you: a good job is a good job, and this has to work somehow!

Thanks again! best regards!

Ok... so, after solving some problems with logstash-forwarder, I got a flow of logs into the logstash server.
In order to keep things easy I filtered to stdout every single log that comes through lumberjack...
So, the output is uncluttered and veeeeery easy to spot logs one by one as they enter.
That way... I can concentrate on @timestamp issue again

As expected, @timestamp and mytiemstamp are now very close, althoug I notice a 2 hour lag, probably due to timezone mismatch between sender and receiving sides (I should check this, but is strange) both date/time are now almost matching

So here's one example:
...
...
{
"message" => "188.63.201.54 188.63.201.54 2015-05-14 02:00:17 /stream?title=SERKAL%3FA%3FS%20%2D%20SEBASTIAN%20OSCILLA%20AND%20NATHAN%20POLE%20%2D%20ELECTRONIC%20EXULTATION%20%2D%20FROM%2001%3A00%20TO%202%3A00%20CET 200 MPEG%20OVERRIDE 2572554 191 107744",
"@version" => "1",
"@timestamp" => "2015-05-14T00:00:18.895Z",
"type" => "logs",
"file" => "/var/log/shoutcast/stream1.w3c.log",
"host" => "nodo01",
"offset" => "0",
"stream_type" => "Shoutcast-1.9.8",
"cluster_node" => "nodo01.cloud01",
"cluster_stream" => "stream1",
"tags" => [
[0] "Shoutcast-w3c"
],
"src_ip" => "188.63.201.54",
"src_dns" => "188.63.201.54",
"mytimestamp" => "2015-05-14 02:00:17",
"stream" => "SERKAL?A?S - SEBASTIAN OSCILLA AND NATHAN POLE - ELECTRONIC EXULTATION - FROM 01:00 TO 2:00 CET",
"c_reply" => "200",
"user_agent" => "MPEG OVERRIDE",
"sc_bytes" => "2572554",
....
....
....

Right now, @timestamp is untouched

while logstash.log complains like this:
:timestamp=>"2015-05-14T02:12:33.907000+0200", :message=>"Failed parsing date from field", :field=>"mytimestamp", :value=>"2015-05-14 02:12:31", :exception=>java.lang.IllegalArgumentException: Invalid format: "2015-05-14 02:12:31" is too short, :level=>:warn}
{:timestamp=>"2015-05-14T02:12:41.399000+0200", :message=>"Failed parsing date from field", :field=>"mytimestamp", :value=>"2015-05-14 02:12:39", :exception=>java.lang.IllegalArgumentException: Invalid format: "2015-05-14 02:12:39" is too short, :level=>:warn}
{:timestamp=>"2015-05-14T02:12:41.403000+0200", :message=>"Failed parsing date from field", :field=>"mytimestamp", :value=>"2015-05-14 02:12:40", :exception=>java.lang.IllegalArgumentException: Invalid format: "2015-05-14 02:12:40" is too short, :level=>:warn}

...reaaaaaaally close...

Got it partially working!!!

You were right, and I was fooled by the clutter...now I see it is working at some extent.
@timestamp is being now updated with log date and time!!!

date filter had to be "YYYY-MM-dd HH:mm:ss" in order to match... with no milliseconds, etc, etc involved.

The proble but, is still noticeable:
@timestamp is 2 hours back in time...

EDIT:
Finally got it working!
An aditional ruby filter event[@tiemstamp] had to be set in place, after date filter, in order to "increase" @timestamp from UTC to my +02:00 local time.

So... SOLVED!!!!

Thank you very very much!
Best regards

An aditional ruby filter event[@tiemstamp] had to be set in place, after date filter, in order to "increase" @timestamp from UTC to my +02:00 local time.

No, don't do that. @timestamp is supposed to be UTC. Kibana and other tools rely on that. Save yourself future surprises by not mucking around with this.