I use Elasticsearch and Kibana, I submit my data with the bulk API in vb.net program.
I did not make any special configuration in elastic search.
here is the content of the file elasticsearch.yml :
On average, I send 600 000 lines a day.
In recent months I have noticed that every month I have a loss of data periodically instead of 600 miles lines on average, I have :
30-06-2019 : 188,626
31-07-2019 : 221,839
31-08-2019 : 206,808
30-09-2019 : 184,473
as shown in this screenshot
Do you have an explanation for this phenomenon?
Do you analyse each bulk response to verify there were no errors and retry if there are? What version of Elasticsearch are you using? How large is your cluster and what does your Elasticsearch.yml file look like?
Thank you for your reply.
1- Yes I analyzed the bulk answers, no problem, the answer is like:
2- I use version 6.5.1 of Elasticsearch
3- Currently I have 242 GB of data
4- elasticsearch.yml :
5- I have a data loss every end of the month
What is the output of the _cluster/health API?
Are you using time-based indices, e.g. daily or monthly ones?
Is there anything in the the Elasticsearch logs around the times you highlighted?
Are you keeping statistics on how much you have indexed on the client side?
"cluster_name" : "Mycluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 17,
"active_shards" : 17,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 15,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 53.125
2- no time-based index is used, on the contrary, I keep my data for 3 years to make annual statistics
3- MyCluster-2019-09-30.log : the majority of the log file contains lines of this kind
4- Yes, as I have already said, my client program sends, on average 600 thousand lines per day, even in the days when there is a problem he sent the data with the same frequency .
Any kind of help will be appreciable.
Are you checking the responses for your bulk requests in your application? The cluster might be rejecting requests due to load, or fail certain requests, which means that you will have to retry those. In particular are you checking whether
errors : false in the bulk response?
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.