How to avoid data loss in logstash?

Hi All,

I observed there are data loss in my elasticsearch cluster, I am comparing the log count from two log monitoring tools,one is ELK and another a third party vendor.

I have verified syslog configuration in all network devices(ASA, Palo alto) and all the configuration are same both destination IPs, however the log count of all devices in third part vendor SIEM is more than the log count in elasticsearch. As per my calculation there is 90% of data loss.

Any help would be appreciated.

How are the logs sent to Logstash? TCP or UDP? Have you checked the Logstash and/or Elasticsearch logs to make sure no documents are rejected by ES?

Hi @magnusbaeck, currently i am using UDP and i tried with TCP as well but it was not working.

Logstash was sending RST to firewall so i again reverted to UDP.

I went through the logs of logstash i found below log.

[2018-09-26T05:29:53,563][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash-syslog-2018.09.25", :_type=>"doc", :_routing=>nil}, #<LogStash::Event:0xd47f1cd>], :response=>{"index"=>{"_index"=>"logstash-syslog-2018.09.25", "_type"=>"doc", "_id"=>"AWYTLMWtieAf_e_rdBMv", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [timestamp]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: "Sep 26 2018 05:29:50""}}}}}

I am not sure why it's storing 26th Sep data to Sep 25th index also note i have multiple devices where i am getting logs in "Sep 26 2018 05:29:50"" format.

Is there any way to change the time format for all logs which is coming in logstash?

But again this is for only type of device for other devices like ASA , i do not have logs as per my comparison with other tool.

One more question:

Is there any chances that logstash is dropping the packets from since the total ram consumption of the server is 98%?

Thanks

Logstash was sending RST to firewall so i again reverted to UDP.

Logstash shouldn't normally close connections like that.

I am not sure why it's storing 26th Sep data to Sep 25th

Probably because UTC is used for index names.

Is there any way to change the time format for all logs which is coming in logstash?

You should use a date filter to parse timestamps that ES recognizes.

Is there any chances that logstash is dropping the packets from since the total ram consumption of the server is 98%?

That's a possibility, but I'd start by addressing the problems listed in the Logstash log. If things are working fine Logstash typically won't log much at all.

Hi @magnusbaeck,

Thanks for quick response.

Yes, i agreed on the problem which is listed in logstash.log, but i am still not clear why i am losing the data for other devices which is having proper @timestamp.

Hi @magnusbaeck,

I want to highlight one more point which i forgot to mention in previous comment.

Total RAM of the machine is 16 Gb
Speed: 667 Mhz
Interface type: DDR2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.