Data node CPU utilization goes very high

sheetal.nainwal · August 18, 2020, 4:02pm

I am using AWS ElasticSearch service and the problem I am facing while ingestion happens on one index, it serves Very high CPU utilization (sometimes 1 or 2 master/data node goes down and sometimes ingestion fails). As result it logs warnings/errors as:

[WARN ][o.e.c.NodeConnectionsService] [7d6f3c47d7f582ced2c090fbf6a3afe5] failed to connect to node {a80c4027a8ff7917bd4f7h8j9k8g5f4d}{PsiWPcLZQEarONo143BLzQ}{ER6LljwCQfGnD4nm42n2TQ}{__IP__}{__IP__}{distributed_snapshot_deletion_enabled=true, __AMAZON_INTERNAL__, __AMAZON_INTERNAL__, cross_cluster_transport_address=__IP__} (tried [1] times) org.elasticsearch.transport.ConnectTransportException: [a80c4027a8ff7917bc0c3767dde0f72e][__IP__] connect_exception at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1299) ~[elasticsearch-7.1.1.jar:7.1.1] at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:99) ~[elasticsearch-7.1.1.jar:7.1.1] indent preformatted text by 4 spaces

[WARN ][o.e.t.TransportService ] [0b1fd2f1876cdcee4abd7a1dcee545454f] Received response for a request that has timed out, sent [18406ms] ago, timed out [3401ms] ago, action [__PATH__[n]], node [{784e7ea9800931208c1a36c04db940e3}{SITLwWDBTpKr2kKrHvkIRQ}{Gq22BKc7SYSuorfCb5lrKA}{__IP__}{__IP__}{distributed_snapshot_deletion_enabled=true, __AMAZON_INTERNAL__, __AMAZON_INTERNAL__, cross_cluster_transport_address=__IP__}], id [3297456]

My ES cluster have :
Data Nodes : 3 (i3.2xlarge.elasticsearch)
Master Nodes: 3 (c5.xlarge.elasticsearch)
Number of Indexes : 18 (Average size of each Index is 15 GB)

Where 9 indexes have 1 Primary and 2 Replica shards and rest Indexes have 2 Primary and 2 Replica shards.

warkolm · August 18, 2020, 9:03pm

Welcome to our community! Please note that AWS ES is a fork of what we offer, so we may not be able to provide support for this problem.

What sort of monitoring functionality do you have in place?

sheetal.nainwal · August 19, 2020, 6:05am

Thank you Mark and yes you are right but I wanted a bit more insights on what could have happened to data node when it starts consuming more CPU

warkolm · August 19, 2020, 6:06am

Ok, so;

Have you looked at hot threads?

system · September 16, 2020, 6:06am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.