Hi, I have an ES 6.3.1 running in a docker container with a volume where data is stored.
I post images as base64 in the doc which worked very well over the last year.
It could be by hazard that, after an backup and restore with volumerize of the data volume, suddenly this exception occurrs:
[2018-07-26T13:54:35,803][WARN ][r.suppressed ] path: /attachments/attachment/upload-1532613215528, params: {refresh=true, index=attachments, id=upload-1532613215528, type=attachment, timeout=1m}
elasticsearch | org.elasticsearch.action.UnavailableShardsException: [attachments][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[attachments][0]] containing [index {[attachments][attachment][upload-1532613215528], source[n/a, actual length: [672.8kb], max length: 2kb]}] and a refresh]
elasticsearch | at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryBecauseUnavailable(TransportReplicationAction.java:928) [elasticsearch-6.3.2.jar:6.3.2]
As far as i found in other forum posts, it could be due to massive shard counts or immediately following actions. I have 10 shards for this index (standard settings) and local network (no network related timeout, running on same machine).
Any ideas please where I can look into to get a better insight, what could cause the problem?
EDIT:
I found a proposed solution to request a healthcheck for an active shard.
ClusterHealthRequest healthRequest = new ClusterHealthRequest();
healthRequest.waitForActiveShards(1);
try {
ActionFuture<ClusterHealthResponse> health = ConnectorManager.getTransportClient().admin().cluster().health(healthRequest);
ConnectorManager.getElastic().index(indexRequest);
} catch (IOException e) {
response.sendError(503, e.getLocalizedMessage());
LOG.error(e.getLocalizedMessage(), e.getCause());
throw new IOException(e);
} catch (Exception e) {
response.sendError(503, e.getLocalizedMessage());
LOG.error(e.getLocalizedMessage(), e.getCause());
throw new IOException(e);
}
But all i get is an exception with:
None of the configured nodes are available: [{#transport#-1}{toFzQQVmRK2g4DKkhAbMNQ}{localhost}{127.0.0.1:9300}]
The port for 9300 is exposed on the docker container for ES.
EDIT II:
Checking the cluster health of affected index it says RED but gives me a vage clue:
{
"cluster_name": "docker-cluster",
"status": "red",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 3,
"active_shards": 3,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 7,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 48.76543209876543,
"indices": {
"attachments": {
"status": "red",
"number_of_shards": 5,
"number_of_replicas": 1,
"active_primary_shards": 3,
"active_shards": 3,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 7,
"shards": {
"0": {
"status": "red",
"primary_active": false,
"active_shards": 0,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 2
},
"1": {
"status": "red",
"primary_active": false,
"active_shards": 0,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 2
},
"2": {
"status": "yellow",
"primary_active": true,
"active_shards": 1,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 1
},
"3": {
"status": "yellow",
"primary_active": true,
"active_shards": 1,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 1
},
"4": {
"status": "yellow",
"primary_active": true,
"active_shards": 1,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 1
}
}
}
}
}