Elastic cluster problem

Hello,

We have an old production cluster which is in version 1.6.2 (yes we should upgrade)

We have 3 masters (one is also data node) and 9 data nodes.

For moment, we are facing some problem and after the master's nodes restarts (one by one), the cluster state is red.

Some infos :
active_primary_shards : 1138
active_shards : 2276
unassigned_shards: 6
No pending tasks

We are using some plugins :
- xml
- kopf
- paramedic
- HQ
- bigdesk
- swift-repository

On master's nodes, we see some logs

  1. [2019-05-08 10:31:25,396][DEBUG][http.netty ] [lg2] Caught exception while handling client http traffic, closing connection

  2. [2019-05-08 10:31:15,287][DEBUG][action.admin.cluster.node.stats] [lg2] failed to execute on node [xFAXo7BAT6O4lxW7hvWVQA] org.elasticsearch.transport.ReceiveTimeoutTransportException: [lg7][inet[/X.X.X.X:9300]][cluster:monitor/nodes/stats[n]] request_id [9143070] timed out after [15000ms

  3. [2019-05-08 10:34:03,595][WARN ][repositories ] [lg10] failed to create repository [swift][swift_backup]
    org.elasticsearch.common.settings.NoClassSettingsException: failed to load class with value [swift]; tried [swift, org.elasticsearch.repositories.SwiftRepositoryModule, org.elasticsearch.repositories.swift.SwiftRepositoryModule, org.elasticsearch.repositories.swift.SwiftRepositoryModule]

I guess the 1st log is a connection closure from a client and is not really important

The 2nd logs is more important, because it seems lg7 nodes is sending time out. It's a data node.
In Kopf, I see that there is a lot load on it. Other nodes have small load.

3rd log is a S3 storage where we put snapshots. I removed the plugin (elasticsearch/bin/plugin -r swift-repository-plugin) to be sure it's not the problem. But one master is keeping sending those logs..

I don't know where the problem is. Should I restart the data node where the load is important ?

When we go on kibana interface, sometimes it's working, sometimes we have "could not contact elasticsearch"

Any idea ?

Regards

Hi,

I solved my issue with this link : https://thoughts.t37.net/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode-ce196e20ba95
and by restarted the node where the timeouts were present

Regards

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.