Elasticsearch Index with INDEX_CREATED status progress

mohamedshokry · December 7, 2020, 3:23pm

Hello,

I have a simple question: Is there an API to show current progress on an Elasticsearch index during different phases like 'INDEX_CREATED'?
I know that there is an API to check current recovery, but since the index is newly created and stuck in 'INITIALIZING' state, I can't find it using recovery API.
my cluster version is: 6.6.1

dadoonet · December 7, 2020, 3:42pm

When you create an index, "normally" when you get the response from elasticsearch, everything is ready to use. It should be super fast BTW.

Upgrade to the latest 6.8 at least or better move to 7.10.

mohamedshokry · December 7, 2020, 4:11pm

Upgrade is in mid-term planning. But isn't there any API like GET /_cat/recovery to show new indices progress?
I have indices stuck in INITIALIZING state for days. I know the reason, it's slow storage back-end.
I just need something to visualize how much percentage allocated and how much left.
Is there APIs in the newer version to monitor newly created indices allocation progress?

DavidTurner · December 7, 2020, 5:12pm

In 6.8.6 (the oldest version I have lying around) a brand-new empty shard is ~8 files totalling less than 1kB in size:

$ ls -al $(find elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0 -type f)
-rw-r--r--  1 davidturner  staff   72  7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/_state/retention-leases-0.st
-rw-r--r--  1 davidturner  staff  125  7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/_state/state-0.st
-rw-r--r--  1 davidturner  staff  230  7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/index/segments_2
-rw-r--r--  1 davidturner  staff    0  7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/index/write.lock
-rw-r--r--  1 davidturner  staff   88  7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/translog/translog-1.ckp
-rw-r--r--  1 davidturner  staff   55  7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/translog/translog-1.tlog
-rw-r--r--  1 davidturner  staff   55  7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/translog/translog-2.tlog
-rw-r--r--  1 davidturner  staff   88  7 Dec 17:07 elasticsearch-6.8.6/data-0/nodes/0/indices/Rpo6dMXwQ9a6XIsc5mdnkg/0/translog/translog.ckp

There's something very very wrong if your storage takes days to put these on disk. It really doesn't make sense to break down the creation of these files and report on progress towards creating them.

Christian_Dahlqvist · December 7, 2020, 5:28pm

How many indices and shards do you have in the cluster? How many new indices are initiated at any specific point in time?

mohamedshokry · December 8, 2020, 8:50am

It really doesn't make sense to break down the creation of these files and report on progress towards creating them.

I totally understand your point. I've never worried before about new shards in INITIALIZING state. But the case I'm investigating pushes me think of such option if it exists.
For the recovering shards I use:
GET _cat/recovery?active_only=true
and I can visualize what is happening and the rate of recovery which is in my case 1.7KBps. I know it's may seem I'm kidding or not knowing what is happening on the VMs. But I've checked the Pods specs where the data nodes are running, worker VMs where the Pod is scheduled and the data pod volume on the back-end storage. The bottleneck is the data volume on back-end storage has high I/O and long read write request queues.

How many indices and shards do you have in the cluster? How many new indices are initiated at any specific point in time?

I have til now 233 Index with 1476 shards. Since they are two types indices I'm not detailing how much shard for each type of index.
I'm using default allocation and recovery parameters:

"transient" : {
    "cluster" : {
      "routing" : {
        "rebalance" : {
          "enable" : "all"
        },
        "allocation" : {
          "node_concurrent_incoming_recoveries" : "2",
          "node_initial_primaries_recoveries" : "4",
          "enable" : "all",
          "node_concurrent_outgoing_recoveries" : "2",
          "allow_rebalance" : "indices_all_active",
          "cluster_concurrent_rebalance" : "1",
          "node_concurrent_recoveries" : "2"
        }
      }
    }

DavidTurner · December 8, 2020, 12:37pm

Is that a typo? Punched tape could do better than that

If it's not a typo then your storage sounds very broken, or overloaded to the point of failure, and I don't really see how it would make things better to observe the progress of initialising an empty shard. This cluster isn't going to be able to do any meaningful work like, say, indexing more than a couple of docs per second, even if all the shards were present and initialized. I don't see a way forward except to get to the bottom of this dreadful performance.

If it's overloaded, what is causing the load? Is it just Elasticsearch or are you sharing with other IO-heavy applications? It's best to isolate each Elasticsearch node from everything else.

mohamedshokry · December 8, 2020, 4:30pm

It's not a typo. This what I have now on the storage. Elasticsearch in my setup is running in pods, those pods are scheduled to worker node with affinity rule to let each Elasticsearch data pod run on a separate worker node.
Storage class for the pods are LUNs created automatically on HPE 3PAR back-end.
I've check the data volumes for the pods and it's almost above 90% utilization as you see below with slow write speed and high queue sizes:

Linux 3.10.0-1062.1.1.el7.x86_64 (bcmt-ovs-worker-0) 	12/08/2020 	_x86_64_	(16 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.01     5.33   55.28   19.18     0.98     7.86   243.18     0.36    7.32    5.29   13.17   0.90   6.72

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00     0.00  243.67    0.00     1.48     0.00    12.42     1.76    7.19    7.19    0.00   3.69  89.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00    21.33  146.00   44.33     2.53    18.49   226.09     2.58   13.64    7.80   32.87   4.79  91.23

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00    41.00   92.00   53.00     2.46    19.98   316.87     2.70   18.71   11.33   31.53   5.63  81.70

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00    35.67   89.67    0.67     4.19     0.14    98.13     0.16    1.85    1.85    2.00   1.71  15.47

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00   153.00   37.33  175.33     1.80    31.04   316.28    11.64   57.76   24.69   64.80   4.32  91.77

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00   132.00   70.67   72.33     1.24     2.63    55.46     0.99    6.96   13.06    1.00   5.67  81.07

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00     0.00   68.67   67.67     1.65    21.18   342.87     2.66   19.08   18.37   19.80   6.22  84.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00    98.00  116.67   13.67     0.46     3.51    62.53     1.77   13.96   15.32    2.41   7.10  92.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00     0.00   75.00    0.00     0.29     0.00     8.00     1.79   23.64   23.64    0.00  12.36  92.67

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00     9.67   71.67   15.67     0.28     6.76   164.98     2.01   22.98   24.57   15.70  10.58  92.37

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00     5.67   98.33    0.67     0.43     0.02     9.35     1.89   19.45   19.58    1.00   9.61  95.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00     0.00  104.67   11.00     0.41     5.05    96.67     3.50   30.13   16.76  157.33   7.91  91.47

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00     7.00  120.00   41.33     0.51    18.70   243.85     3.55   21.53   15.03   40.39   5.85  94.37

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vdc               0.00     0.00  115.33   48.67     0.65    22.50   289.09     3.55   22.40   16.61   36.14   5.94  97.47

So I guarantee that Elasticsearch is utilizing the underlying infrastructure on it's own without a share. I'm trying to consult a storage experts for this. Maybe I have something utilizing the back-end storage box.

system · January 5, 2021, 4:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.