Help parsing [06/Sep/2017:10:57:42 -0400]

Hello,

I am using the following regex to parse out [06/Sep/2017:10:57:42 -0400]

[0-9]{2}/[A-z]{3}/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2} -[0-9]{4}

I used the HTTPDATE grok, but this is placing my timestamp in a CST timezone instead of EST.

I am not understanding why logstash is throwing an error on this regex when it is breaking the data as intended.

Thanks.

HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT}

Why would this fail? The grok filter isn't the one determining the date, but rather the date filter.

Show your date filter configuration, and we might be able to ascertain why you're getting the wrong timezone.

Here is the date filter:
date {
match => ["event_ts" ,"dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
timezone => "America/New_York"
}

That is what i thought as well. This data is coming in consistently every minute or so. When i set the HTTPDATE grok the data "stopped" showing up, but when i did last hr the data was there. It was just showing up in different TZ .. atleast that's what i believe. Once i took off the HTTPDATE grok the data started flowing back consistently as before. Then i added the regex above and it is giving me a "grokparsefailure". Stdout to the console passed ... grok debugger passed... i no longer trust those tests. That is why i decided to post here to see if anyone else is experiencing the same.

Thanks

This line is superfluous as this is the default. It can be safely omitted.

This is probably why you're having timezone issues. You have defined a timezone in 2 different places (the one represented by Z and this one).

What comes out if you remove this?

Which one would you like me to remove? The Z or the actual timezone stanza? We are wanting the data to be show up in Kibana as EST.

Thanks

I decided to do a bit of a test:

I created a datetest.conf:

input { stdin {} }
filter { date { match => ["message" ,"dd/MMM/yyyy:HH:mm:ss Z"] } }
output { stdout { codec => rubydebug } }

And ran it:

bin/logstash -f datetest.conf

And the output was (I pasted your date):

06/Sep/2017:10:57:42 -0400
{
    "@timestamp" => 2017-09-06T14:57:42.000Z,
      "@version" => "1",
          "host" => "REDACTED",
       "message" => "06/Sep/2017:10:57:42 -0400"
}

The 4 hour difference is right there. It's accurate from everything I can tell.

The output will always come out in Zulu notation, which means ISO8601 UTC time, with a Z at the end, as in

"@timestamp" => 2017-09-06T14:57:42.000Z

Kibana will automatically obtain your timezone from the browser, and re-interpret this to show time in your timezone. Elasticsearch stores all dates as UTC time. The client/browser/kibana are expected to translate UTC into your local time.

That's the thing @theuntergeek when i run the same tests these all print out to the terminal successfully, but that is not the case when i run it and send it to Kibana ... it comes up as a grok parse failure. So i am now testing my GROK pattern by pattern ... and it failing in the timestamp portion of the data. That is how i've realized that its not liking the regex that i am adding for the timestamp and it is placing the data at another TZ.

You should use the HTTPDATE grok pattern, as that's accurate.

I changed the above example:

input { stdin {} }
filter {
  grok {
    match => { "message" => "\[%{HTTPDATE:timestamp}\]" }
  }
  date { match => ["timestamp" ,"dd/MMM/yyyy:HH:mm:ss Z"] }
}
output { stdout { codec => rubydebug } }

And this is now the output, when I paste [06/Sep/2017:10:57:42 -0400]:

[06/Sep/2017:10:57:42 -0400]
{
    "@timestamp" => 2017-09-06T14:57:42.000Z,
      "@version" => "1",
          "host" => "REDACTED",
       "message" => "[06/Sep/2017:10:57:42 -0400]",
     "timestamp" => "06/Sep/2017:10:57:42 -0400"
}

It's now extracting HTTPDATE into the timestamp field. The date filter now uses that field. The results are the same, though now we have an additional timestamp field, which can be removed on successful conversion by adding remove_field => ["timestamp"] to the date filter block. If the conversion is unsuccessful, the field will not be removed.

Also note that [ and ] must be escaped in your grok pattern, as they have meaning in regular expressions.

1 Like

A grok parse failure means something is wrong in Logstash. It would be the same whether at the command-line or launched via init script. That's why I suggest using HTTPDATE as in my example above.

When i add the HTTPDATE the data is not showing up anymore. I have to do last 1hr.

date {
match => ["event_ts" ,"dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
}
I took out the the timezone part. You mentioned something about the Z in the match ... i am just adding the match as it is in the timestamp. I am including the Z because it has it.

This is the log
10.100.333.12 - - [06/Sep/2017:12:55:43 -0400] "GET /api/core/v3/streams/77077/activities?fields=-resources&count=16&collapse=true HTTP/1.1" 200 44161 1 24 "-" "Java/1.8.0_51" "application/json" - 10.223.232.44 1466771

GROK

%{IP:client_ip}%{SPACE}(?:-|{%USER:client_port})%{SPACE}(?:-|%{USER:ident})%{SPACE}[%{HTTPDATE:event_ts}]

As mentioned above, your grok rule needs to escape the [ and ] characters, as they have special meaning in regular expressions:

%{IP:client_ip}%{SPACE}(?:-|{%USER:client_port})%{SPACE}(?:-|%{USER:ident})%{SPACE}\[%{HTTPDATE:event_ts}\]

Additionally, as also mentioned above, you do not need target => "@timestamp", as this is the default behavior. It is superfluous.

Here is another example, expanded to incorporate the new data you provided. Please note that I was obliged to change 10.100.333.12 to 10.100.33.12 because the 333 is an invalid octet, so the grok rule for IP was failing. I also replaced %{SPACE} with actual spaces, and captured everything else after the timestamp in a field called therest. It works as expected:

input { stdin {} }
filter {
  grok {
    match => { "message" => "%{IP:client_ip} (?:-|{%USER:client_port}) (?:-|%{USER:ident}) \[%{HTTPDATE:event_ts}\]%{GREEDYDATA:therest}" }
  }
  date { match => ["event_ts" ,"dd/MMM/yyyy:HH:mm:ss Z"] }
}
output { stdout { codec => rubydebug } }

And the output:

10.100.33.12 - - [06/Sep/2017:12:55:43 -0400] "GET /api/core/v3/streams/77077/activities?fields=-resources&count=16&collapse=true HTTP/1.1" 200 44161 1 24 "-" "Java/1.8.0_51" "application/json" - 10.223.232.44 1466771
{
    "@timestamp" => 2017-09-06T16:55:43.000Z,
      "event_ts" => "06/Sep/2017:12:55:43 -0400",
      "@version" => "1",
          "host" => "REDACTED",
     "client_ip" => "10.100.33.12",
       "therest" => " \"GET /api/core/v3/streams/77077/activities?fields=-resources&count=16&collapse=true HTTP/1.1\" 200 44161 1 24 \"-\" \"Java/1.8.0_51\" \"application/json\" - 10.223.232.44 1466771",
       "message" => "10.100.33.12 - - [06/Sep/2017:12:55:43 -0400] \"GET /api/core/v3/streams/77077/activities?fields=-resources&count=16&collapse=true HTTP/1.1\" 200 44161 1 24 \"-\" \"Java/1.8.0_51\" \"application/json\" - 10.223.232.44 1466771"
}

I am escaping them for some reason when pasted in here the slashes are removed. I have done those tests in the past see here from a previous post:

My Grok Pattern:

^%{IP:client_ip}%{SPACE}%{USER:client_port}%{SPACE}%{USER:ident}%{SPACE}[%{HTTPDATE:event_ts}]%{SPACE}"%{WORD:method}%{SPACE}%{DATA:uri_path}%{SPACE}HTTP/%{NUMBER:http_version}"%{SPACE}%{NUMBER:status}%{SPACE}(?:-|%{NUMBER:bytes})%{SPACE}%{NUMBER:request_time_in_secs}%{SPACE}%{NUMBER:keep_alives}%{SPACE}"%{GREEDYDATA:referer}"%{SPACE}"%{GREEDYDATA:user_agent}"%{SPACE}"%{GREEDYDATA:contenttype}"%{SPACE}%{DATA:jsession_id}%{SPACE}%{IP:x_clientip}%{SPACE}%{NUMBER:response_time_in_secs}$

stdout put on a test:

“referer” => “https://someurl/docs/DOC-00000”,
“method” => “POST”,
“response_time_in_secs” => “97849”,
“ident” => “-”,
“http_version” => “1.1”,
“message” => “10.100.200.00 - - [30/Aug/2017:16:20:47 -0400] “POST /some/path/here HTTP/1.1” 200 109 0 0 “https://someurl/docs/DOC-00000” “Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko” “application/json” 2CE9B72CBD4D03C457A2893EF60128F6 10.100.200.200 97849”,
“keep_alives” => “0”,
“x_clientip” => “10.100.200.200”,
“contenttype” => “application/json”,
“path” => “/apps/logstash-5.5.0/bin/test.txt”,
“client_port” => “-”,
“jsession_id” => “3F966A293216B6A0315DF0D1FFCE7AEB”,
"@timestamp" => 2017-08-30T20:20:47.000Z,
“uri_path” => “some/path/here”,
“bytes” => “109”,
“event_ts” => “30/Aug/2017:16:20:47 -0400”,
"@version" => “1”,
“host” => “myserver”,
“request_time_in_secs” => “0”,
“client_ip” => “10.100.200.200”,
“user_agent” => “Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko”,
“status” => “200”

We are wanting a field for each of the fields in the event. So this output would be different than yours a bit. So that is what i was saying earlier ... the tests are work as intended. When i send the data to kibana it does not work, and i now believe at least for my case its the HTTPDATE grok or any other regexed grok that i am attempting to use. The GROK "breaks when it hits that part of the event. I am now testing the other parts of the GROK and to do so I am having to do this:

%{IP:client_ip}%{SPACE}(?:-|%{USER:client_port})%{SPACE}(?:-|%{USER:ident})%{SPACE}\[%{DATA}\]%{SPACE}\"%{WORD:log_level}%{SPACE}%{URIPATHPARAM:uri_path}%{SPACE}HTTP/%{NUMBER:http_version}

I am not even parsing the date out as it will break the rest. Might there be a bug or something? This issue has me perplexed as all the testing works, but not in Kibana.

Thanks.

Please encapsulate said output in triple back-ticks:

```
PASTED CONTENT
```

I took the liberty of editing your last post to add these.

Perhaps you omitted to escape the double-quotes in your grok rule?

I have it fully functional as follows:

input { stdin {} }
filter {
  grok {
    match => { "message" => "%{IP:client_ip} (?:-|%{USER:client_port}) (?:-|%{USER:ident}) \[%{HTTPDATE:event_ts}\] \"%{WORD:method} %{DATA:uri_path} HTTP/%{NUMBER:http_version}\" %{NUMBER:status:int} (?:-|%{NUMBER:bytes:int}) %{NUMBER:request_time_in_secs:int} %{NUMBER:keep_alives:int} %{QS:referer} %{QS:user_agent} %{QS:contenttype} %{DATA:jsession_id} %{IP:x_clientip} %{NUMBER:response_time_in_secs:int}" }
  }
  date { match => ["event_ts" ,"dd/MMM/yyyy:HH:mm:ss Z"] }
}
output { stdout { codec => rubydebug } }

It's much easier to use QS or QUOTEDSTRING instead of DATA for referrer, agent, and contenttype. I also took the liberty of making numbers int values so they'll be passed on that way.

The output is as follows:

10.100.200.00 - - [30/Aug/2017:16:20:47 -0400] "POST /some/path/here HTTP/1.1" 200 109 0 0 "https://someurl/docs/DOC-00000" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko" "application/json" 2CE9B72CBD4D03C457A2893EF60128F6 10.100.200.200 97849
{
                  "referer" => "\"https://someurl/docs/DOC-00000\"",
                   "method" => "POST",
    "response_time_in_secs" => 97849,
             "http_version" => "1.1",
                  "message" => "10.100.200.00 - - [30/Aug/2017:16:20:47 -0400] \"POST /some/path/here HTTP/1.1\" 200 109 0 0 \"https://someurl/docs/DOC-00000\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko\" \"application/json\" 2CE9B72CBD4D03C457A2893EF60128F6 10.100.200.200 97849",
              "keep_alives" => 0,
               "x_clientip" => "10.100.200.200",
              "contenttype" => "\"application/json\"",
              "jsession_id" => "2CE9B72CBD4D03C457A2893EF60128F6",
               "@timestamp" => 2017-08-30T20:20:47.000Z,
                 "uri_path" => "/some/path/here",
                    "bytes" => 109,
                 "event_ts" => "30/Aug/2017:16:20:47 -0400",
                 "@version" => "1",
                     "host" => "REDACTED",
     "request_time_in_secs" => 0,
                "client_ip" => "10.100.200.00",
               "user_agent" => "\"Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko\"",
                   "status" => 200
}

I also note that your quotes, when pasted here, are being translated into smart quotes. I had to edit them to be regular ascii quotes (not left/right specific) to use in Logstash.

Thank you @theuntergeek. So I see that you are using actual "spaces" as oppose to %{SPACE} why is that? I figured this would be the best to use since it's a GROK in house pattern? Secondly any idea as to why the HTTPDATE might not be working when sending the data to Kibana?

It works for me the same way it does in your examples, but I would like it to work in the environment.

Thanks.

%{SPACE} is useful when there's more than one space, or tabs. If it's a single space, it's not needed.

First, the data isn't sent to Kibana. It goes to Elasticsearch for indexing. Kibana reads it from there. A properly formatted @timestamp field should be a non-issue in Kibana. I would look to see if you have a mapping conflict in your Elasticsearch logs. It could be that some prior attempt has resulted in improperly mapped data. If you can delete the existing index and try again, it may clear up. You may also need to go to the Management pane of Kibana, and refresh the fields and values for your Index Pattern.

Thank you for all your help @theuntergeek I appreciate it. I will look into the mapping conflict you mentioned.

@theuntergeek,

I have been working on this for a couple of days now and have been able to get my timestamp (event_timestamp) to show up in kibana. The way i was able to to do so was by removing the "Z" from the date plugin:

match => ["event_timestamp" ,"dd/MMM/yyyy:HH:mm:ss"]

When i do this i am getting a _dateparsefailure which makes sense. If I add the "Z" back to it my data stops showing up in Kibana.

How can i correct this?

The data must be in Elasticsearch. The answer to the question, "why am I not seeing it?" is probably because you have a timezone misconfigured on the computer your browser is running on. What happens if you change the time picker in Kibana to be a much bigger time slice than the default 15 minutes? Like, 12 or 24 hours?

Also, what is the timezone of the machine you're on?