Elasticsearch 7.17.10 indexing bottleneck on i3.2xlarge and d3.2xlarge nodes in EKS

Chris_Austin · May 23, 2023, 5:42pm

Still pretty concentrated.

Indexing rate by node, 7d:

Write queue by node, 7d:

Write queue by node, 2d to see more recent data.

Chris_Austin · May 23, 2023, 5:48pm

The output is too large for a comment - preferred way to provide this?

DavidTurner · May 23, 2023, 6:07pm

Any idea why? Are the hotter nodes holding more shards seeing heavy indexing? There's still a lot of empty queues here.

Can you use https://gist.github.com?

Chris_Austin · May 23, 2023, 6:13pm

gist.github.com

https://gist.github.com/chrisxaustin/9befa14a1c14d0ba33873700830d06e1

es-tasks

::: {alerts-es-hot-4}{-EeyMsEySdWaOrqgjuNvWg}{zxIWclJrTv6cLY8eqPrRYQ}{172.16.19.193}{172.16.19.193:9300}{hir}{k8s_node_name=ip-172-16-19-230.us-east-2.compute.internal, xpack.installed=true, data=hot, transform.node=false}
   Hot threads at 2023-05-23T17:43:16.187Z, interval=500ms, busiestThreads=9999, ignoreIdleThreads=true:
   
   61.5% [cpu=61.5%, other=0.0%] (307.6ms out of 500ms) cpu usage by thread 'elasticsearch[alerts-es-hot-4][[alerts-redacted-traffic-000052][0]: Lucene Merge Thread #218]'
     2/10 snapshots sharing following 15 elements
       app//org.apache.lucene.index.MultiTermsEnum.postings(MultiTermsEnum.java:359)
       app//org.apache.lucene.index.MappedMultiFields$MappedMultiTermsEnum.postings(MappedMultiFields.java:127)
       app//org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:127)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:907)
       app//org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:318)

This file has been truncated. show original

DavidTurner · May 23, 2023, 6:25pm

Thanks. So again the hot threads indicates most nodes are underutilised, with many idle write threads. There's a few over-hot nodes that could be a bottleneck tho, and alerts-es-hot-8 looks to be struggling a bit with IO, spending a lot of time in force0 (i.e. fsync):

$ cat es-tasks| grep -B2 force0
   100.0% [cpu=38.2%, other=61.8%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[alerts-es-hot-8][write][T#8]'
     5/10 snapshots sharing following 27 elements
       java.base@20.0.1/sun.nio.ch.UnixFileDispatcherImpl.force0(Native Method)
--
   100.0% [cpu=33.3%, other=66.7%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[alerts-es-hot-8][write][T#6]'
     5/10 snapshots sharing following 27 elements
       java.base@20.0.1/sun.nio.ch.UnixFileDispatcherImpl.force0(Native Method)
--
       java.base@20.0.1/java.lang.Thread.run(Thread.java:1623)
     unique snapshot
       java.base@20.0.1/sun.nio.ch.UnixFileDispatcherImpl.force0(Native Method)
--
   100.0% [cpu=22.9%, other=77.1%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[alerts-es-hot-8][write][T#1]'
     5/10 snapshots sharing following 27 elements
       java.base@20.0.1/sun.nio.ch.UnixFileDispatcherImpl.force0(Native Method)

I'd suggest focussing on why there's such an imbalance in work across your nodes.

Chris_Austin · May 23, 2023, 6:33pm

gist.github.com

https://gist.github.com/chrisxaustin/769fe1622a3f510a165c78973d98bec6

es-ss

alerts-es-hot-0
State                      Recv-Q                      Send-Q                                             Local Address:Port                                                Peer Address:Port                       Process
ESTAB                      0                           0                                                   172.16.54.28:39108                                              172.16.45.112:9300                        timer:(keepalive,2min41sec,0) uid:1000 ino:3039544091 sk:1 <->
	 ts sack cubic wscale:7,7 rto:204 rtt:2.106/3.232 ato:40 mss:8949 pmtu:9001 rcvmss:8949 advmss:8949 cwnd:10 ssthresh:26 bytes_sent:443922732 bytes_retrans:42576 bytes_acked:443880157 bytes_received:433237555 segs_out:322679 segs_in:191894 data_segs_out:177343 data_segs_in:179809 send 339943020bps lastsnd:2172 lastrcv:2172 lastack:2172 pacing_rate 679684328bps delivery_rate 3029505560bps delivered:177344 busy:206368ms retrans:0/11 dsack_dups:11 rcv_rtt:1 rcv_space:251134 rcv_ssthresh:1104828 minrtt:0.186
ESTAB                      0                           0                                                   172.16.54.28:55866                                              172.16.21.255:9300                        timer:(keepalive,3min48sec,0) uid:1000 ino:3039543943 sk:2 <->
	 ts sack cubic wscale:7,7 rto:204 rtt:0.371/0.11 ato:40 mss:8949 pmtu:9001 rcvmss:8949 advmss:8949 cwnd:30 ssthresh:168 bytes_sent:398436136392 bytes_retrans:7711475 bytes_acked:398428424918 bytes_received:2236654499 segs_out:46048301 segs_in:5577138 data_segs_out:45517376 data_segs_in:714216 send 5789110512bps lastsnd:31820 lastrcv:31788 lastack:31788 pacing_rate 11558748736bps delivery_rate 2743944248bps delivered:45516972 busy:1190776ms rwnd_limited:44ms(0.0%) retrans:0/1182 dsack_dups:777 reordering:170 reord_seen:4039 rcv_rtt:1009.97 rcv_space:152520 rcv_ssthresh:709358 minrtt:0.139
ESTAB                      0                           0                                                   172.16.54.28:9300                                                172.16.8.254:51664                       timer:(keepalive,3min3sec,0) uid:1000 ino:3039540080 sk:3 <->
	 ts sack cubic wscale:7,7 rto:208 rtt:5.433/10.459 ato:40 mss:8949 pmtu:9001 rcvmss:2270 advmss:8949 cwnd:10 bytes_sent:6658 bytes_acked:6658 bytes_received:2681 segs_out:9428 segs_in:9430 data_segs_out:2 data_segs_in:2 send 131772501bps lastsnd:1466933236 lastrcv:1466933260 lastack:116604 pacing_rate 263514688bps delivery_rate 411448272bps delivered:3 app_limited busy:40ms rcv_space:56575 rcv_ssthresh:56575 minrtt:0.174
ESTAB                      0                           0                                                   172.16.54.28:9300                                                172.16.12.77:39778                       timer:(keepalive,2min47sec,0) uid:1000 ino:3039540137 sk:4 <->
	 ts sack cubic wscale:7,7 rto:204 rtt:0.286/0.012 ato:40 mss:8949 pmtu:9001 rcvmss:8949 advmss:8949 cwnd:20 ssthresh:40 bytes_sent:334042779 bytes_retrans:259 bytes_acked:334042520 bytes_received:285319364 segs_out:52617 segs_in:53554 data_segs_out:38736 data_segs_in:33490 send 5006433566bps lastsnd:2001180 lastrcv:2001364 lastack:132988 pacing_rate 6007720272bps delivery_rate 2237250000bps delivered:38737 busy:1884ms retrans:0/1 dsack_dups:1 rcv_rtt:25100.3 rcv_space:735803 rcv_ssthresh:2645802 minrtt:0.257

This file has been truncated. show original

Chris_Austin · May 23, 2023, 6:36pm

Is there a recommended way to do this with 7.17.10? I thought that shard allocation based on size or write only became available in 8.

DavidTurner · May 23, 2023, 7:21pm

Thanks, ok, no significant holdups on the network layer it seems.

Firstly just work out if it's needed (using e.g. GET _cat/shards). If there is a hot spot for a particular index then sometimes it's because the shards don't divide equally across the available nodes so it can be addressed just by changing the number of shards/replicas. Otherwise you can use the index.routing.allocation.total_shards_per_node index setting to force the shards of particular indices to spread out.

Chris_Austin · May 23, 2023, 8:07pm

There are 217 empty indices that were created prematurely, but I've tested removing them without any noticeable impact to throughput - I was hoping that the shard rebalancing would have fewer "lucky" nodes that got mostly-idle shards. I also used the API to move busy shards from the busy nodes to the nodes with the lowest cpu usage, but that's obviously not a sustainable practice unless I write a service to redistribute them for me. I'd like to move to ES 8, which might help, but I'm getting some pressure to consider using opensearch to offload that troubleshooting effort, and upgrading past 7 makes that path a bit harder. I'd prefer to stay with ES, so thank you again for all your help in this thread - it's very appreciated.

I believe this was addressed in this part of the thread - only one index would be affected, and I've dropped that one from 5 shards to 4, which should allow it to be distributed across all nodes without any overlap.

system · May 23, 2023, 8:07pm

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

DavidTurner · May 23, 2023, 8:47pm

AFAIK OpenSearch uses the same shard allocation algorithm as Elasticsearch 7.x, so I don't think that will change anything.

BenB196 · May 24, 2023, 11:22am

Since you're managing this Elasticsearch cluster via ECK, could you provide the Elasticsearch YAML file that you're using to deploy your cluster?

Chris_Austin · May 25, 2023, 1:03pm

Yeah, the opensearch piece is just to make it someone else's problem with a support contract.

system · June 22, 2023, 1:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster resource usage Elasticsearch	14	447	July 6, 2017
Indexing is becoming slow, what to look for? Elasticsearch	8	373	July 6, 2017
Very slow ElasticSearch Index Elasticsearch	8	395	July 6, 2017
Improve indexing throughput Elasticsearch	15	2659	July 6, 2017
Cluster not able to keep up? Elasticsearch	12	4237	July 6, 2017

Elasticsearch 7.17.10 indexing bottleneck on i3.2xlarge and d3.2xlarge nodes in EKS

Related topics