Timeout and slow responses

Hi,
I have a Elasticsearch cluster in my k8s cluster to handle a unique small (451k docs) index (here the stats: ES Cluster stats as at 20211117 · GitHub ), so small (2gb) that I use 1 shard. All 5 nodes pods have 1CPU, 3Gbi RAM and JVM is configured with xms=xmx=2Gb.

creations 0 p STARTED 451887   2gb 10.42.9.3   elasticsearch-master-1
creations 0 r STARTED 451882 1.9gb 10.42.10.34 elasticsearch-master-0
creations 0 r STARTED 451884 1.9gb 10.42.11.4  elasticsearch-master-3
creations 0 r STARTED 451879 1.5gb 10.42.5.125 elasticsearch-master-4
creations 0 r STARTED 451887   2gb 10.42.0.180 elasticsearch-master-2

As you can see, there's 1 primary and 4 replicas.
My problems:

  • slow query response
  • slow update of the index
    Both meet many timeouts and some "org.Elasticsearch.action.search.SearchPhaseExecutionException: all shards failed"" errors: Some ES timeout logs · GitHub

How could I improve my response time and avoid timeout? Should I give more resources to the pods? Make a multiple primary node shards?

Thank you for your pointers.

1 Like

What type of storage are you using?

No persistence at all, so I guess in memory + ephemeral storage, nodes are mainly HDD 7200rpm and some nodes should have SSD.
Does it help?

Elasticsearch does not run in memory, so the storage used will matter and can often be the bottleneck. I would recommend looking into storage performance.

Elasticsearch also relies on the operating page cache for performance so the fact that you have assigned over 50% of available RAM to heap probably affects performance negatively too.

Thank you for the hint, here a hdparm test for one of the node (all will be comparable):

/dev/md2:
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 readonly      =  0 (off)
 readahead     = 4096 (on)
 geometry      = 233225216/2/4, sectors = 1865801728, start = unknown
 Timing cached reads:   13872 MB in  1.99 seconds = 6963.47 MB/sec
 Timing buffered disk reads: 1482 MB in  3.01 seconds = 493.16 MB/sec

It doesn't seems very low, especially for such a small index, don't you think? What performance should I reach to avoid timeout?

Elasticsearch does a lot of reasonably small reads and writes with frequent fsyncs during indexing so I am not sure that benchmark is representative.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.