Populating logtime for @timestamp

Hi mates,

I am parsing the logs of Bluecoat proxies using filebeat and indexing the logs using logstash. When I search in Elasticsearch for the indexed data, I am getting the timestamp as the indexed time. I want to take the time from the log file to populate it in the @timestamp field. I have used the date filter as suggested by many users long ago, but still the @timestamp is populated with the time of indexing or processing time of filebeat. I would like to know of any other alternative, a proper solution would be even great. I am posting the logstash configuration in here for your reference.

input {
    beats {
        port => "5044"
    }
}

filter{
    grok{
        match =>
{"message"=>"%{TIMESTAMP_ISO8601:timestamp} (%{NUMBER:time_taken}|\-) (%{IP:sourceIP}|\-) (%{NUMBER:status}|\-) (%{WORD:action}|\-) (%{NUMBER:scBytes}|\-) (%{NUMBER:csBytes}|\-) (%{WORD:method}|\-) (%{WORD:uri_scheme}|\-) (%{URIHOST:url}|\-) (%{NUMBER:port}|\-) (?<uri_path>([a-zA-Z0-9\/\.\?\-\_\~\%\[\]]+)|(\/)) (?<uri_query>([a-zA-Z0-9\/\.\?\-\=\&\%]+)) (?<username>([a-zA-Z0-9\/\.\?\-\$]+)) (?<auth_group>([a-zA-Z0-9\/\.\?\-\$]+)) (?<destIP>([a-zA-Z0-9\.\-]+)) (?<content_type>([a-zA-Z0-9\-\/\;\%\=]+)) (?<referer>[a-zA-Z0-9\-\/\;\%\=\_\:\.\[\]]+) (%{QUOTEDSTRING:user_agent}|\-) (%{WORD:filter_result}|\-) (%{QUOTEDSTRING:category}|\-) (?<vir_id>([a-zA-Z0-9\-\/.])) (%{IP:proxyIP}|\-)"
    }
    }
    date{
        match=>["timestamp","ISO8601"]
    }
}
#end of filter


output 
{
	
	elasticsearch 
	{
	hosts => ["localhost:9200"] 
	index => proxylog
	}
}

Please show an example event. Copy/paste from Kibana's JSON tab or use some other means to show the raw event.

{ "_index": "proxylog", "_type": "log", "_id": "AV8VtZgfD01jiLtMayQE", "_version": 1, "_score": null, "_source": { "auth_group": "-", "referer": "-", "proxyIP": "10.145.22.31", "source": "D:\\bluecoat logs\\SG_main_IP31_M10_D12175036.log", "type": "log", "scBytes": "356324", "uri_path": "/", "filter_result": "OBSERVED", "sourceIP": "10.142.91.76", "content_type": "-", "@version": "1", "beat": { "hostname": "AZSG-P-EOPIRMC", "name": "AZSG-P-EOPIRMC", "version": "5.5.2" }, "host": "AZSG-P-EOPIRMC", "action": "TCP_TUNNELED", "user_agent": "\"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\"", "destIP": "10.145.22.31", "vir_id": "-", "offset": 18126822, "method": "CONNECT", "input_type": "log", "message": "2017-10-12 12:22:49 162539 10.142.91.76 200 TCP_TUNNELED 356324 2073 CONNECT tcp i9.ytimg.com 443 / - VSOWM - 10.145.22.31 - - \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\" OBSERVED \"Allowed Multimedia Domains;Audio/Video Clips;Content Servers\" - 10.145.22.31", "time_taken": "162539", "uri_scheme": "tcp", "url": "i9.ytimg.com", "tags": [ "beats_input_codec_plain_applied", "_dateparsefailure" ], "@timestamp": "2017-10-13T12:27:50.222Z", "uri_query": "-", "mytimestamp": "2017-10-12 12:22:49", "port": "443", "csBytes": "2073", "category": "\"Allowed Multimedia Domains;Audio/Video Clips;Content Servers\"", "status": "200", "username": "VSOWM" }, "fields": { "@timestamp": [ 1507897670222 ] }, "sort": [ 1507897670222 ] }
This is a sample after indexing, found from Kibana response. I am using ELK 5.5.2 and filebeat associated with the same version. I can see dateparse failure, I am not sure what is causing this. I have just moved from using Splunk to ELK as we are thinking of using ELK from here on, but it require that all the use cases implemented in splunk be implemented on ELK as well.

The following is how the data is shown in Kibana
October 13th 2017, 17:57:50.222 auth_group:- referer:- proxyIP:10.145.22.31 source:D:\bluecoat logs\SG_main_IP31_M10_D12175036.log type:log scBytes:0 uri_path:/ filter_result:DENIED sourceIP:10.214.64.60 content_type:- @version:1 beat.hostname:AZSG-P-EOPIRMC beat.name:AZSG-P-EOPIRMC beat.version:5.5.2 host:AZSG-P-EOPIRMC action:TCP_ERR_MISS destIP:0.0.0.0 vir_id:- offset:18,125,553 method:CONNECT input_type:log message:2017-10-12 12:22:49 228541 1

Both represent two different events.

Your date filter attempts to parse a timestamp field but your event doesn't have such a field.

I can see dateparse failure, I am not sure what is causing this.

The Logstash log should contain clues.

But the log file has the filed for the date and time. Below is a sample of log line

2017-10-12 12:20:36 10 10.123.123.24 200 TCP_HIT 24419 433 GET http clienttemplates.content.office.net 80 /support/templates - - - 2.4.4.48 image/png - "Microsoft Office/15.0 (Windows NT 6.1; Microsoft Excel 15.0.4953; Pro)" OBSERVED "Technology/Internet" - 10.23.3.4

The JSON blob you posted does not have a field named timestamp. It does, however, have a mytimestamp field.

Yes, thats because I had changed the name to mytimestamp after I posted this topic. Sorry for not updating.

Did you look in the Logstash log for clues? The date filter should log something when it fails and adds a _dateparsefailure tag.

I found that it was because I used "ISO8601". When I used YYYY-MM-dd format it works fine. Now I have modified the logstash conf file to use csv filter in place of grok since grok's regex I had written was not working fine with different format of logs. Now the new problems I am facing are

i) the time is taken from logs and added to @timestamp but, the time in @timestamp field is 5 hrs 30 minutes behind the actual time from log time. Please refer the screenshot and compare @timestamp and mytimestamp field

.

ii) when I use date filter the data is not being populated for searching in elasticsearch although I can see them being captured in logstash since I am using the "stdout {codec => rubydebug}". When I remove the date filter I can see the data added in Elasticsearch.

Below is my new conf file for logsash

  input {
    beats {
        port => "5044"
    }
}

filter {
csv {
separator => " "
columns=>["date","time","time_taken","c_ip","sc_status","s_action","sc_bytes","cs_bytes","cs_method","cs_uri_scheme","cs_host","cs_uri_port","cs_uri_path","cs_uri_query","cs_username","cs_auth_group","s_supplier_name","rs_Content_Type","cs_Referer","cs_User_Agent","sc_filter_result","cs_categories","x_virus_id","s_ip"]
add_field => {"mytimestamp"=> "%{date} %{time}"}
}
date{
	
	match=>["mytimestamp","YYYY-MM-dd HH:mm:ss"]
}




}



output 
{
	stdout{
		codec=>rubydebug
	}
	
	elasticsearch 
	{
	hosts => ["localhost:9200"] 
	action => "index"
	index => proxylog
	}
}

the time is taken from logs and added to @timestamp but, the time in @timestamp field is 5 hrs 30 minutes behind the actual time from log time. Please refer the screenshot and compare @timestamp and mytimestamp field

That's because @timestamp is always UTC and your timestamp presumably is IST.

when I use date filter the data is not being populated for searching in elasticsearch although I can see them being captured in logstash since I am using the "stdout {codec => rubydebug}". When I remove the date filter I can see the data added in Elasticsearch.

If Logstash can't index documents in ES it'll give you clues in its log.

If so how do I convert it back? I can see something like 'Z', 'ZZ' and 'ZZZ', found that they are for timezone. I want to add +0530 since our timezone is related to Asia/Kolkata. How can this be achieved?

I checked the log, but there is no data in that. It only pertains logs till start of logstash agent, apart from that there is no data regarding indexing.

@magnusbaeck Thanks a lot for your timely help. I found that it was not a problem with Elastic or Logstash or Kibana, the problem was me being dumb. I can not laugh enough thinking how dumb am I. The log time contained data for 14th Oct and I was searching for data with Today's data which probably resulted with "no results found". I just searched with the time from log and found that my data were present. Once again, thank you for your help.
Now the only odd is the time taken in UTC. Is there a way to convert the time to Asia/Kolkata timezone by adding 5hours and 30 minutes?

If so how do I convert it back?

That's not supported. Why does it matter anyway? Kibana adjusts timestamps to the browser's timezone.

It matters because, we are using API calls to generate log statistics for the users. They might complain that they didn't use their system during the time mentioned. Is there any other option left or should I leave the calculations to the presentation layers?

Making timezone adjustments in the presentation layer is the best option.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.