ES loses the data being collected when deleting the index

horaceli · December 11, 2018, 12:19pm

es lose data when the cluster is deleting index.As shown below 20181211

how to fix the problem?

I mean, when I delete the index, the amount of data being collected is also reduced.The picture shows the amount of data collected at each time(per minute).

The cluster has four nodes, one of them setting is:
node.master: false
node.data: false
node.ingest: false

the rest of the nodes setting is default.

Christian_Dahlqvist · December 11, 2018, 4:09pm

Isn't it expected that the amount of data in the cluster goes down if you delete an index?

horaceli · December 12, 2018, 1:48am

Sorry, I didn’t make it clear.I mean, when I delete the index, the amount of data being collected is also reduced.The picture shows the amount of data collected at each time(per minute).

horaceli · December 16, 2018, 6:28am

anyone can help me?

Christian_Dahlqvist · December 16, 2018, 8:37am

Is there anything in the logs around that time? If indexing slows down at that time, this does generally not mean that data will be lost as clients generally will retry on failure or timeout.

horaceli · December 20, 2018, 6:55am

es logs:
[2018-12-20T04:02:40,847][INFO ][o.e.c.m.MetaDataDeleteIndexService] [iom] [filebeat-2018.12.12/AzYCjS0jTsy3ovBI8An13A] deleting index
[2018-12-20T04:03:10,880][WARN ][o.e.d.z.PublishClusterStateAction] [iom] timed out waiting for all nodes to process published state [3888] (timeout [30s], pending nodes: [{iom}{XWu3adjESaGrlDdqONE4-A}{SKwUJfhaT8GeO31NxV-Fbw}{172.31.24.86}{172.31.24.86:9300}, {iom}{ZEh8uJqGSV2EytseLQSz4w}{82xwXo0eQNmVV6aWfTXVTw}{172.31.24.88}{172.31.24.88:9300}])
[2018-12-20T04:05:10,947][DEBUG][o.e.a.a.i.d.TransportDeleteIndexAction] [iom] failed to delete indices [[[metricbeat-2018.12.12/S0asWes4TZyGJ6XCCjg76g]]]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (delete-index [[metricbeat-2018.12.12/S0asWes4TZyGJ6XCCjg76g]]) within 2m
at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$null$0(ClusterService.java:255) ~[elasticsearch-5.6.3.jar:5.6.3]
at java.util.ArrayList.forEach(ArrayList.java:1255) ~[?:1.8.0_151]
at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$onTimeout$1(ClusterService.java:254) ~[elasticsearch-5.6.3.jar:5.6.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.3.jar:5.6.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2018-12-20T04:11:00,175][WARN ][o.e.c.s.ClusterService ] [iom] cluster state update task [delete-index [[filebeat-2018.12.12/AzYCjS0jTsy3ovBI8An13A]]] took [8.3m] above the warn threshold of 30s

Christian_Dahlqvist · December 20, 2018, 6:58am

What is the output off the cluster health API?

horaceli · December 20, 2018, 7:01am

Our cluster usually delete the index at 4:00 am, the above logs show that the cluster is deleting the index called filebeat-2018.12.12(because the cluster stores 7 days of the data).

horaceli · December 20, 2018, 7:02am

{
"cluster_name": "iot-elk-cluster",
"status": "green",
"timed_out": false,
"number_of_nodes": 4,
"number_of_data_nodes": 3,
"active_primary_shards": 87,
"active_shards": 175,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100
}

Christian_Dahlqvist · December 20, 2018, 7:12am

It looks like it is timing out publishing the cluster state. Is the cluster under heavy load? Do you see any long or frequent GC? Do you have minimum_master_nodes set to 2 given that you appear to have 3 master-eligible nodes?

horaceli · December 20, 2018, 7:21am

It seems that our cluster is really heavily loaded, each ES node has 16 core CPUs, but the cpu load is above 10, and one of them is around 16.
But GC is rare, the ES node is 64G memory, and the es memory configured is 31G.
minimum_master_nodes set is 2

horaceli · December 21, 2018, 9:01am

Does anyone has good ideas?

Christian_Dahlqvist · December 21, 2018, 9:06am

It is quite possible that the cluster is overloaded and not able to process and distribute changes to the cluster state fast enough. One way to address this would be to scale out the cluster and distribute the load across more hosts. You may also benefit from introducing 3 small dedicated master nodes. These typically do not use a lot of resources, but as they do not hold data or serve traffic they do not get overloaded and can focus on managing the cluster state.

horaceli · December 21, 2018, 9:10am

thanks, i will try

system · January 18, 2019, 9:10am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES data loss on index deletion Elasticsearch	4	760	July 6, 2017
Elasticsearch cluster data nodes are being removed from cluster by timeout Elasticsearch	18	1383	November 11, 2021
ES fails to delete index Elasticsearch	2	12922	July 5, 2017
Sudden data loss! Elasticsearch	15	5433	July 6, 2017
Sudden deletion of data Elasticsearch	4	4496	April 17, 2018

ES loses the data being collected when deleting the index

Related topics