Elastic cluster problem

jof300 · May 8, 2019, 8:52am

Hello,

We have an old production cluster which is in version 1.6.2 (yes we should upgrade)

We have 3 masters (one is also data node) and 9 data nodes.

For moment, we are facing some problem and after the master's nodes restarts (one by one), the cluster state is red.

Some infos :
active_primary_shards : 1138
active_shards : 2276
unassigned_shards: 6
No pending tasks

We are using some plugins :
- xml
- kopf
- paramedic
- HQ
- bigdesk
- swift-repository

On master's nodes, we see some logs

[2019-05-08 10:31:25,396][DEBUG][http.netty ] [lg2] Caught exception while handling client http traffic, closing connection
[2019-05-08 10:31:15,287][DEBUG][action.admin.cluster.node.stats] [lg2] failed to execute on node [xFAXo7BAT6O4lxW7hvWVQA] org.elasticsearch.transport.ReceiveTimeoutTransportException: [lg7][inet[/X.X.X.X:9300]][cluster:monitor/nodes/stats[n]] request_id [9143070] timed out after [15000ms
[2019-05-08 10:34:03,595][WARN ][repositories ] [lg10] failed to create repository [swift][swift_backup]
org.elasticsearch.common.settings.NoClassSettingsException: failed to load class with value [swift]; tried [swift, org.elasticsearch.repositories.SwiftRepositoryModule, org.elasticsearch.repositories.swift.SwiftRepositoryModule, org.elasticsearch.repositories.swift.SwiftRepositoryModule]

I guess the 1st log is a connection closure from a client and is not really important

The 2nd logs is more important, because it seems lg7 nodes is sending time out. It's a data node.
In Kopf, I see that there is a lot load on it. Other nodes have small load.

3rd log is a S3 storage where we put snapshots. I removed the plugin (elasticsearch/bin/plugin -r swift-repository-plugin) to be sure it's not the problem. But one master is keeping sending those logs..

I don't know where the problem is. Should I restart the data node where the load is important ?

When we go on kibana interface, sometimes it's working, sometimes we have "could not contact elasticsearch"

Any idea ?

Regards

jof300 · May 8, 2019, 1:30pm

Hi,

I solved my issue with this link : https://thoughts.t37.net/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode-ce196e20ba95
and by restarted the node where the timeouts were present

Regards

system · June 5, 2019, 1:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unstable cluster Elasticsearch	11	2036	July 6, 2017
TransportNodesStatsAction Elasticsearch	3	717	January 25, 2019
Received response for a request that has timed out Elasticsearch	1	1718	February 6, 2020
ES Prod cluster Receive Timeout Transport Exception Elasticsearch	7	6618	July 5, 2017
Timeouts in cluster management requests ES 7.11.2 leading to nodes in the cluster freezing Elasticsearch	8	1905	July 5, 2021

Elastic cluster problem

Related topics