Stats endpoints response slow

Elasticsearch: 7.5.1
Infrastructure: Azure AKS
Storage: Standard SSD


name                   m role ip          ramMax ramPercent ramCurrent heapMax heapPercent heapCurrent diskTotal diskUsed cpu uptime iic
elasticsearch-master-1 * dilm 62.8gb         66     41.2gb  14.9gb          50       7.5gb       2tb    1.5tb   5   1.2d   0
elasticsearch-master-0 - dilm 62.8gb         60     37.7gb  14.9gb          68      10.2gb       2tb    1.3tb   5   1.2d   0
elasticsearch-master-2 - dilm 62.8gb         67       42gb  14.9gb          46       6.9gb       2tb    1.5tb   4   1.1d   0


shards disk.indices disk.used disk.avail disk.percent host        ip          node
   999        1.5tb     1.5tb    449.1gb        2tb           78 elasticsearch-master-1
   999        1.5tb     1.5tb    499.9gb        2tb           75 elasticsearch-master-2
   999        1.3tb     1.3tb    717.3gb        2tb           65 elasticsearch-master-0

Hello Team,
After recent ELK cluster reboot Node and Indice stats endpoints eventually become very slow for one (elasticsearch-master-2) of 3 nodes in the cluster.
I narrowed down the slowness to a specific metric of the translog, examples below:

GET /_nodes/elasticsearch-master-2/stats/indices/translog

GET /_all/_stats/translog

It takes about 10-15s for one single task to process acording to _cat/tasks.
All other metrics return result almost immidiatly for the affected node.

We use elasticsearch_exporter to send metrics to Prometheus by having it enabled it becomes impossible to use cluster: _cat/tasks

As a workaround we shutdown the exporter. Is there a well known bug for this behavior or any chance to find the root cause of the unexpected slowness?

Thank you.


Hot threads report during long running task /_nodes/_local/hot_threads

Looks like the caching introduced in #82721 should fix this for you. You should upgrade anyway, v7.5 is very old and long past EOL so it's not supported any more.

1 Like

Thank you David for the reply!