ES loses the data being collected when deleting the index

es lose data when the cluster is deleting index.As shown below20181211

how to fix the problem?

I mean, when I delete the index, the amount of data being collected is also reduced.The picture shows the amount of data collected at each time(per minute).

The cluster has four nodes, one of them setting is:
node.master: false
node.data: false
node.ingest: false

the rest of the nodes setting is default.

Isn't it expected that the amount of data in the cluster goes down if you delete an index?

Sorry, I didn’t make it clear.I mean, when I delete the index, the amount of data being collected is also reduced.The picture shows the amount of data collected at each time(per minute).

anyone can help me?

Is there anything in the logs around that time? If indexing slows down at that time, this does generally not mean that data will be lost as clients generally will retry on failure or timeout.

es logs:
[2018-12-20T04:02:40,847][INFO ][o.e.c.m.MetaDataDeleteIndexService] [iom] [filebeat-2018.12.12/AzYCjS0jTsy3ovBI8An13A] deleting index
[2018-12-20T04:03:10,880][WARN ][o.e.d.z.PublishClusterStateAction] [iom] timed out waiting for all nodes to process published state [3888] (timeout [30s], pending nodes: [{iom}{XWu3adjESaGrlDdqONE4-A}{SKwUJfhaT8GeO31NxV-Fbw}{172.31.24.86}{172.31.24.86:9300}, {iom}{ZEh8uJqGSV2EytseLQSz4w}{82xwXo0eQNmVV6aWfTXVTw}{172.31.24.88}{172.31.24.88:9300}])
[2018-12-20T04:05:10,947][DEBUG][o.e.a.a.i.d.TransportDeleteIndexAction] [iom] failed to delete indices [[[metricbeat-2018.12.12/S0asWes4TZyGJ6XCCjg76g]]]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (delete-index [[metricbeat-2018.12.12/S0asWes4TZyGJ6XCCjg76g]]) within 2m
at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$null$0(ClusterService.java:255) ~[elasticsearch-5.6.3.jar:5.6.3]
at java.util.ArrayList.forEach(ArrayList.java:1255) ~[?:1.8.0_151]
at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$onTimeout$1(ClusterService.java:254) ~[elasticsearch-5.6.3.jar:5.6.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.3.jar:5.6.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2018-12-20T04:11:00,175][WARN ][o.e.c.s.ClusterService ] [iom] cluster state update task [delete-index [[filebeat-2018.12.12/AzYCjS0jTsy3ovBI8An13A]]] took [8.3m] above the warn threshold of 30s

What is the output off the cluster health API?

Our cluster usually delete the index at 4:00 am, the above logs show that the cluster is deleting the index called filebeat-2018.12.12(because the cluster stores 7 days of the data).

{
"cluster_name": "iot-elk-cluster",
"status": "green",
"timed_out": false,
"number_of_nodes": 4,
"number_of_data_nodes": 3,
"active_primary_shards": 87,
"active_shards": 175,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100
}

It looks like it is timing out publishing the cluster state. Is the cluster under heavy load? Do you see any long or frequent GC? Do you have minimum_master_nodes set to 2 given that you appear to have 3 master-eligible nodes?

It seems that our cluster is really heavily loaded, each ES node has 16 core CPUs, but the cpu load is above 10, and one of them is around 16.
But GC is rare, the ES node is 64G memory, and the es memory configured is 31G.
minimum_master_nodes set is 2

Does anyone has good ideas?

It is quite possible that the cluster is overloaded and not able to process and distribute changes to the cluster state fast enough. One way to address this would be to scale out the cluster and distribute the load across more hosts. You may also benefit from introducing 3 small dedicated master nodes. These typically do not use a lot of resources, but as they do not hold data or serve traffic they do not get overloaded and can focus on managing the cluster state.

thanks, i will try

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.