Performance Impact of Deleting an Index

xianhechen · February 24, 2021, 8:09pm

Hello!

We have an ElasticSearch 7.5.0 cluster with 5 master nodes, 3 coordinator nodes, and 15 data nodes, each of these nodes has a 20-core CPU (HyperThread disabled), 512GB RAM, 1.8TB SSD (RAID 1 +0), and 10GB of network bandwidth. We have 3 replicas and 30 shards for each of our indices, and we do need to delete and create new indices regularly; each index has a storage size around 700GB.

I used the DELETE API to remove indices that we don't need, but every time I do that the master node would remove all of our data nodes because of lagging and ultimately will make the cluster health status red.

To my understanding deleting an index should be a lightweight operation, and there should be no impact on the cluster health at all, but when I made a request to the hot threads API during the deleting operation, I got a lot of threads similar to this one here before the master node lost connection to the data nodes (esd mean ElasticSearch Data here):

106.8% (533.8ms out of 500ms) cpu usage by thread 'elasticsearch[esd07][clusterApplierService#updateTask][T#1]'
     10/10 snapshots sharing following 28 elements
       java.base@13.0.1/sun.nio.fs.UnixNativeDispatcher.rmdir0(Native Method)
       java.base@13.0.1/sun.nio.fs.UnixNativeDispatcher.rmdir(UnixNativeDispatcher.java:232)
       java.base@13.0.1/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:234)
       java.base@13.0.1/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
       java.base@13.0.1/java.nio.file.Files.delete(Files.java:1145)
       app//org.elasticsearch.core.internal.io.IOUtils$1.postVisitDirectory(IOUtils.java:218)
       app//org.elasticsearch.core.internal.io.IOUtils$1.postVisitDirectory(IOUtils.java:207)
       java.base@13.0.1/java.nio.file.Files.walkFileTree(Files.java:2821)
       java.base@13.0.1/java.nio.file.Files.walkFileTree(Files.java:2875)
       app//org.elasticsearch.core.internal.io.IOUtils.rm(IOUtils.java:207)
       app//org.elasticsearch.core.internal.io.IOUtils.rm(IOUtils.java:187)
       app//org.elasticsearch.env.NodeEnvironment.deleteShardDirectoryUnderLock(NodeEnvironment.java:520)
       app//org.elasticsearch.env.NodeEnvironment.deleteShardDirectorySafe(NodeEnvironment.java:469)
       app//org.elasticsearch.indices.IndicesService.deleteShardStore(IndicesService.java:864)
       app//org.elasticsearch.indices.store.IndicesStore$ShardActiveResponseHandler.lambda$allNodesResponded$2(IndicesStore.java:294)
       app//org.elasticsearch.indices.store.IndicesStore$ShardActiveResponseHandler$$Lambda$4911/0x00000008016cfc40.accept(Unknown Source)
       app//org.elasticsearch.cluster.service.ClusterApplierService.lambda$runOnApplierThread$0(ClusterApplierService.java:315)
       app//org.elasticsearch.cluster.service.ClusterApplierService$$Lambda$4913/0x00000008016cb440.apply(Unknown Source)
       app//org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.apply(ClusterApplierService.java:171)
       app//org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:407)
       app//org.elasticsearch.cluster.service.ClusterApplierService.access$100(ClusterApplierService.java:73)
       app//org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703)
       app//org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
       app//org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
       java.base@13.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
       java.base@13.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
       java.base@13.0.1/java.lang.Thread.run(Thread.java:830)

We also have 2 other ElasticSearch 7.5.0 clusters with 3 master nodes, 3 coordinator nodes, and 5 data nodes, hardware configuration of these clusters are the same as our 23 nodes cluster mentioned above; indices on these clusters have 2 replicas and 10 shards, deleting them have not made the cluster health red yet, but they do have much smaller index size (around 100GB). Every time this issue happens we have to wait for ElasticSearch to recover and it creates some down time to our system

Should I look into adjusting the cluster settings like "cluster.follower_lag.timeout" or "cluster.publish.timeout"? What else could be done to help our cluster delete the index more efficient?

Thanks,
Xianhe

Christian_Dahlqvist · February 24, 2021, 8:37pm

How many indices do you have in each cluster?

xianhechen · February 24, 2021, 8:50pm

Hi Christian, thanks for the reply! We have at most 3 indices at the same time in each cluster.

system · March 24, 2021, 8:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES data loss on index deletion Elasticsearch	4	760	July 6, 2017
Performance impact of deleting large index in Elasticsearch Elasticsearch ilm-index-lifecycle-management	2	1135	March 15, 2020
Delete all Indices through curl API Elasticsearch	1	5785	July 6, 2017
ES loses the data being collected when deleting the index Elasticsearch	14	1034	January 18, 2019
Recommended way to delete an index Elasticsearch	14	1487	August 15, 2017

Performance Impact of Deleting an Index

Related topics