I am parsing the logs of Bluecoat proxies using filebeat and indexing the logs using logstash. When I search in Elasticsearch for the indexed data, I am getting the timestamp as the indexed time. I want to take the time from the log file to populate it in the @timestamp field. I have used the date filter as suggested by many users long ago, but still the @timestamp is populated with the time of indexing or processing time of filebeat. I would like to know of any other alternative, a proper solution would be even great. I am posting the logstash configuration in here for your reference.
{ "_index": "proxylog", "_type": "log", "_id": "AV8VtZgfD01jiLtMayQE", "_version": 1, "_score": null, "_source": { "auth_group": "-", "referer": "-", "proxyIP": "10.145.22.31", "source": "D:\\bluecoat logs\\SG_main_IP31_M10_D12175036.log", "type": "log", "scBytes": "356324", "uri_path": "/", "filter_result": "OBSERVED", "sourceIP": "10.142.91.76", "content_type": "-", "@version": "1", "beat": { "hostname": "AZSG-P-EOPIRMC", "name": "AZSG-P-EOPIRMC", "version": "5.5.2" }, "host": "AZSG-P-EOPIRMC", "action": "TCP_TUNNELED", "user_agent": "\"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\"", "destIP": "10.145.22.31", "vir_id": "-", "offset": 18126822, "method": "CONNECT", "input_type": "log", "message": "2017-10-12 12:22:49 162539 10.142.91.76 200 TCP_TUNNELED 356324 2073 CONNECT tcp i9.ytimg.com 443 / - VSOWM - 10.145.22.31 - - \"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\" OBSERVED \"Allowed Multimedia Domains;Audio/Video Clips;Content Servers\" - 10.145.22.31", "time_taken": "162539", "uri_scheme": "tcp", "url": "i9.ytimg.com", "tags": [ "beats_input_codec_plain_applied", "_dateparsefailure" ], "@timestamp": "2017-10-13T12:27:50.222Z", "uri_query": "-", "mytimestamp": "2017-10-12 12:22:49", "port": "443", "csBytes": "2073", "category": "\"Allowed Multimedia Domains;Audio/Video Clips;Content Servers\"", "status": "200", "username": "VSOWM" }, "fields": { "@timestamp": [ 1507897670222 ] }, "sort": [ 1507897670222 ] }
This is a sample after indexing, found from Kibana response. I am using ELK 5.5.2 and filebeat associated with the same version. I can see dateparse failure, I am not sure what is causing this. I have just moved from using Splunk to ELK as we are thinking of using ELK from here on, but it require that all the use cases implemented in splunk be implemented on ELK as well.
The following is how the data is shown in Kibana October 13th 2017, 17:57:50.222 auth_group:- referer:- proxyIP:10.145.22.31 source:D:\bluecoat logs\SG_main_IP31_M10_D12175036.log type:log scBytes:0 uri_path:/ filter_result:DENIED sourceIP:10.214.64.60 content_type:- @version:1 beat.hostname:AZSG-P-EOPIRMC beat.name:AZSG-P-EOPIRMC beat.version:5.5.2 host:AZSG-P-EOPIRMC action:TCP_ERR_MISS destIP:0.0.0.0 vir_id:- offset:18,125,553 method:CONNECT input_type:log message:2017-10-12 12:22:49 228541 1
I found that it was because I used "ISO8601". When I used YYYY-MM-dd format it works fine. Now I have modified the logstash conf file to use csv filter in place of grok since grok's regex I had written was not working fine with different format of logs. Now the new problems I am facing are
i) the time is taken from logs and added to @timestamp but, the time in @timestamp field is 5 hrs 30 minutes behind the actual time from log time. Please refer the screenshot and compare @timestamp and mytimestamp field
ii) when I use date filter the data is not being populated for searching in elasticsearch although I can see them being captured in logstash since I am using the "stdout {codec => rubydebug}". When I remove the date filter I can see the data added in Elasticsearch.
the time is taken from logs and added to @timestamp but, the time in @timestamp field is 5 hrs 30 minutes behind the actual time from log time. Please refer the screenshot and compare @timestamp and mytimestamp field
That's because @timestamp is always UTC and your timestamp presumably is IST.
when I use date filter the data is not being populated for searching in elasticsearch although I can see them being captured in logstash since I am using the "stdout {codec => rubydebug}". When I remove the date filter I can see the data added in Elasticsearch.
If Logstash can't index documents in ES it'll give you clues in its log.
If so how do I convert it back? I can see something like 'Z', 'ZZ' and 'ZZZ', found that they are for timezone. I want to add +0530 since our timezone is related to Asia/Kolkata. How can this be achieved?
I checked the log, but there is no data in that. It only pertains logs till start of logstash agent, apart from that there is no data regarding indexing.
@magnusbaeck Thanks a lot for your timely help. I found that it was not a problem with Elastic or Logstash or Kibana, the problem was me being dumb. I can not laugh enough thinking how dumb am I. The log time contained data for 14th Oct and I was searching for data with Today's data which probably resulted with "no results found". I just searched with the time from log and found that my data were present. Once again, thank you for your help. Now the only odd is the time taken in UTC. Is there a way to convert the time to Asia/Kolkata timezone by adding 5hours and 30 minutes?
It matters because, we are using API calls to generate log statistics for the users. They might complain that they didn't use their system during the time mentioned. Is there any other option left or should I leave the calculations to the presentation layers?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.