I have a simple question: Is there an API to show current progress on an Elasticsearch index during different phases like 'INDEX_CREATED'?
I know that there is an API to check current recovery, but since the index is newly created and stuck in 'INITIALIZING' state, I can't find it using recovery API.
my cluster version is: 6.6.1
Upgrade is in mid-term planning. But isn't there any API like GET /_cat/recovery to show new indices progress?
I have indices stuck in INITIALIZING state for days. I know the reason, it's slow storage back-end.
I just need something to visualize how much percentage allocated and how much left.
Is there APIs in the newer version to monitor newly created indices allocation progress?
In 6.8.6 (the oldest version I have lying around) a brand-new empty shard is ~8 files totalling less than 1kB in size:
$ ls -al $(find elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0 -type f)
-rw-r--r-- 1 davidturner staff 72 7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/_state/retention-leases-0.st
-rw-r--r-- 1 davidturner staff 125 7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/_state/state-0.st
-rw-r--r-- 1 davidturner staff 230 7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/index/segments_2
-rw-r--r-- 1 davidturner staff 0 7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/index/write.lock
-rw-r--r-- 1 davidturner staff 88 7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/translog/translog-1.ckp
-rw-r--r-- 1 davidturner staff 55 7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/translog/translog-1.tlog
-rw-r--r-- 1 davidturner staff 55 7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/translog/translog-2.tlog
-rw-r--r-- 1 davidturner staff 88 7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/translog/translog.ckp
There's something very very wrong if your storage takes days to put these on disk. It really doesn't make sense to break down the creation of these files and report on progress towards creating them.
It really doesn't make sense to break down the creation of these files and report on progress towards creating them.
I totally understand your point. I've never worried before about new shards in INITIALIZING state. But the case I'm investigating pushes me think of such option if it exists.
For the recovering shards I use:
GET _cat/recovery?active_only=true
and I can visualize what is happening and the rate of recovery which is in my case 1.7KBps. I know it's may seem I'm kidding or not knowing what is happening on the VMs. But I've checked the Pods specs where the data nodes are running, worker VMs where the Pod is scheduled and the data pod volume on the back-end storage. The bottleneck is the data volume on back-end storage has high I/O and long read write request queues.
How many indices and shards do you have in the cluster? How many new indices are initiated at any specific point in time?
I have til now 233 Index with 1476 shards. Since they are two types indices I'm not detailing how much shard for each type of index.
I'm using default allocation and recovery parameters:
Is that a typo? Punched tape could do better than that
If it's not a typo then your storage sounds very broken, or overloaded to the point of failure, and I don't really see how it would make things better to observe the progress of initialising an empty shard. This cluster isn't going to be able to do any meaningful work like, say, indexing more than a couple of docs per second, even if all the shards were present and initialized. I don't see a way forward except to get to the bottom of this dreadful performance.
If it's overloaded, what is causing the load? Is it just Elasticsearch or are you sharing with other IO-heavy applications? It's best to isolate each Elasticsearch node from everything else.
It's not a typo. This what I have now on the storage. Elasticsearch in my setup is running in pods, those pods are scheduled to worker node with affinity rule to let each Elasticsearch data pod run on a separate worker node.
Storage class for the pods are LUNs created automatically on HPE 3PAR back-end.
I've check the data volumes for the pods and it's almost above 90% utilization as you see below with slow write speed and high queue sizes:
So I guarantee that Elasticsearch is utilizing the underlying infrastructure on it's own without a share. I'm trying to consult a storage experts for this. Maybe I have something utilizing the back-end storage box.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.