Best practice to track data loss when sending data to ElasticSeach

raychen11 · April 2, 2018, 9:58pm

Wondering if we have any suitable ways to check the data loss when sending data via Logstash to ElasticSearch endpoint.

Ways I know

check logstash.log file to see if any errors are there
1.1 Works but seems to need a lot manual work, and we cannot do much if we have Timeout errors.
check the number of lines generated by service, and validate the same number of lines from Kibana
1.1 Works, but hard to track all the files continuously.

What I need

Automatic way to track percent of files that are sent to ElasticSearch correctly
Get notification when there are severe data loss

LogStash version: 2.1

Thanks!

Example of Timeout error (This is aws elasticsearch, may be different from open source logstash)

{:timestamp=>"2018-04-02T20:51:22.822000+0000", :message=>"Attempted to send a bulk request to Elasticsearch configured at '[\"https://fake_endpoint.es.amazonaws.com:443\"]', but an error occurred and it failed! Are you sure you can reach elasticsearch from this machine using the configuration provided?", :client_config=>{:hosts=>["https://fake_endpoint.es.amazonaws.com:443"], :region=>"us-east-1", :aws_access_key_id=>nil, :aws_secret_access_key=>nil, :aws_odin_material_set=>nil, :transport_options=> . {:request=>{:open_timeout=>0, :timeout=>60}, :proxy=>nil}, :transport_class=>Elasticsearch::Transport::Transport::HTTP::AWS, :logger=>nil, :tracer=>nil, :reload_connections=>false, :retry_on_failure=>false, :reload_on_failure=>false, :randomize_hosts=>false}, :error_message=>"fake_endpoint.es.amazonaws.com:443 failed to respond", :error_class=>"Faraday::ClientError", :backtrace=>nil, :level=>:error}

{:timestamp=>"2018-04-02T20:51:22.824000+0000", :message=>"Failed to flush outgoing items", :outgoing_count=>1, :exception=>"Faraday::ClientError", :backtrace=>nil, :level=>:warn}

Evesy · April 3, 2018, 6:05pm

Hey Raychen,

I don't know what the best method is but I can speak for what we've done to try and track data loss/issues getting data to Elasticsearch.

First off there's two main areas we wanted to monitor...

Elasticsearch performance issues, resulting in 5xx's
Bad documents, resulting in 4xx's (if I remember correctly)

In the case of the first instance, we have a queue in the middle of our Logstash layer so we have a monitor on this queue. If the queue starts building up, it is a good indication that Elasticsearch ingest cannot keep up with the amount of documents being sent. Ultimately Logstash will retry these messages, so there shouldn't be any data loss. However if you don't have a queue at all Logstash can eventually back up and you will lose messages.

In the second instance, documents can be dropped if they're in an invalid format, mapping conflicts etc., in which case you can use Logstash DLQ to write these bad documents to a file: https://www.elastic.co/guide/en/logstash/current/dead-letter-queues.html
You could monitor this queue via size, or even setup an additional pipeline to pick up these DLQ documents and process them into a 'quarantine' index, and monitor the size of that.

We also have a basic little application that sends x number of events to Logstash every y minutes, and then checks that all those events are available in Kibana within a certain period of time.

I appreciate this isn't a helpful answer, but I don't think there's a simple clean way to measure number of documents lost, I'd probably instead focus on increasing the resiliency so the messages are persisted, and monitor that instead.

Cheers,
Mike

raychen11 · April 6, 2018, 5:11pm

Hey Michael,

Appreciate your suggestions, yeah, the DLQ sounds great!
Just one more follow up, for the first instance, can I ask some more details about the queue in the middle of logstash layer. how to create this queue? Is this a feature provided by Logstash? Can I have some references for that.

Thanks!

Evesy · April 6, 2018, 5:57pm

Logstash does provide persistent queues which achieve a similar purpose -- Events are written to disk, and are only removed once successfully received by your output (Elasticsearch). https://www.elastic.co/guide/en/logstash/current/persistent-queues.html

Other options are to have two Logstash layers -- One that simply receives events and puts them on a queue (i.e. Kafka, Redis etc.), and another layer that reads from this queue, applies filters, and then outputs to Elasticsearch:

Source -> Logstash -> Kafka/Redis/etc. -> Logstash -> Elasticsearch

Personally I wouldn't look into adding a persistent queue just yet -- I'd figure out if the data loss is due to bad documents or Elasticsearch performance first. Adding the DLQ should help you find out, you'll be able to see any documents that have been rejected and why.

raychen11 · April 9, 2018, 5:14pm

That's great! Thanks for the detail explanation!

system · May 7, 2018, 5:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Prevent data loss if elasticsearch not available Logstash	2	134	May 3, 2024
DataLoss in Logstash! Logstash	12	5521	July 6, 2017
Using logstash/elastic search for firewall syslog indexing: syslogs are lost on the way Elasticsearch	2	570	July 6, 2017
Data loss in Elasticstack Elasticsearch	7	507	July 26, 2020
Are the logs lost if elastic search down Elasticsearch	4	697	August 22, 2021

Best practice to track data loss when sending data to ElasticSeach

Related topics