Hey,
Recently i have upgraded a few of our clusters from 7.12.0 to 7.17.5.
After the upgrade two indices were created:
.ds-.logs-deprecation.elasticsearch-default-2022.09.07-000001 and .ds-ilm-history-5-2022.09.07-000001.
Ever since the upgrade, every once in a while, the cluster gets to red status with one or both of the indices get unassigned shards.
When running "_cluster/allocation/explain" i get:
{
"note": "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
"index": ".ds-.logs-deprecation.elasticsearch-default-2022.09.07-000001",
"shard": 0,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2022-09-11T05:51:27.373Z",
"failed_allocation_attempts": 1,
"details": "failed shard on node [FFM3J3yWSmSX-wlXsrRXWA]: shard failure, reason [translog trimming failed], failure NoSuchFileException[/outbrain/elasticsearch/data/a/2/nodes/0/indices/FdbvZAilSDeoMEEOnbLfSQ/0/translog/translog-17781480244351786913.ckp]",
"last_allocation_status": "no_valid_shard_copy"
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
"node_allocation_decisions": [
{
"node_id": "0qpc6AsETcOr-W12d7Pv3Q",
"node_name": "espc11d-40002-prod-chidc2.chidc2.outbrain.com",
"transport_address": "10.43.20.69:9301",
"node_attributes": {
"ml.machine_memory": "66709622784",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "16106127360",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "9g5vBmanQ1SJaYY847T3dg",
"node_name": "2_espc11d-40001-prod-chidc2.chidc2.outbrain.com",
"transport_address": "10.43.12.17:9300",
"node_attributes": {
"ml.machine_memory": "66709131264",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "16106127360",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "FFM3J3yWSmSX-wlXsrRXWA",
"node_name": "2_espc11d-40003-prod-chidc2.chidc2.outbrain.com",
"transport_address": "10.43.27.42:9300",
"node_attributes": {
"ml.machine_memory": "66709622784",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "16106127360",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"in_sync": true,
"allocation_id": "nxCMwK5XRSehJzpHnZLVWg",
"store_exception": {
"type": "file_not_found_exception",
"reason": "no segments* file found in NIOFSDirectory@/outbrain/elasticsearch/data/a/2/nodes/0/indices/FdbvZAilSDeoMEEOnbLfSQ/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@15411d35: files: []"
}
}
},
{
"node_id": "VXfmP9oCSuGTZDoXjR8mLQ",
"node_name": "espc11d-40003-prod-chidc2.chidc2.outbrain.com",
"transport_address": "10.43.27.42:9301",
"node_attributes": {
"ml.machine_memory": "66709622784",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "16106127360",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "asd8KjU-SjiPbg8Ce6l8vg",
"node_name": "2_espc11d-40002-prod-chidc2.chidc2.outbrain.com",
"transport_address": "10.43.20.69:9300",
"node_attributes": {
"ml.machine_memory": "66709622784",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "16106127360",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"in_sync": false,
"allocation_id": "HLrIwLgCRxaNUnjv37ItMQ"
}
},
{
"node_id": "xkkz9w_PTLm_hvYXI-BlaA",
"node_name": "espc11d-40001-prod-chidc2.chidc2.outbrain.com",
"transport_address": "10.43.12.17:9301",
"node_attributes": {
"ml.machine_memory": "66709131264",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "16106127360",
"transform.node": "true"
},
"node_decision": "no",
"store": {
"found": false
}
}
]
}
When looking at the logs of the master node i see:
[2022-09-11T01:52:27,013][ERROR][o.e.x.d.l.DeprecationIndexingComponent] [espc11m-40001-prod-chidc2.chidc2.outbrain.com] Bulk write of deprecation logs encountered some failures: [[-IkaK4MBQmlD_BTWSvbD RemoteTransportException[[2_espc11d-40003-prod-chidc2.chidc2.outbrain.com][10.43.27.42:9300][indices:data/write/bulk[s]]]; nested: UnavailableShardsException[[.ds-.logs-deprecation.elasticsearch-default-2022.09.07-000001][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-.logs-deprecation.elasticsearch-default-2022.09.07-000001][0]] containing [2] requests]];, -YkaK4MBQmlD_BTWSvbD RemoteTransportException[[2_espc11d-40003-prod-chidc2.chidc2.outbrain.com][10.43.27.42:9300][indices:data/write/bulk[s]]]; nested: UnavailableShardsException[[.ds-.logs-deprecation.elasticsearch-default-2022.09.07-000001][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-.logs-deprecation.elasticsearch-default-2022.09.07-000001][0]] containing [2] requests]];]]
On the data node i see:
[2022-09-11T02:30:33,145][ERROR][o.e.x.d.l.DeprecationIndexingComponent] [espc11d-40001-prod-chidc2.chidc2.outbrain.com] Bulk write of deprecation logs encountered some failures: [[bEA9K4MB0CoBznwrLD3O UnavailableShardsException[[.ds-.logs-deprecation.elasticsearch-default-2022.09.07-000001][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.ds-.logs-deprecation.elasticsearch-default-2022.09.07-000001][0]] containing [index {[.logs-deprecation.elasticsearch-default][_doc][bEA9K4MB0CoBznwrLD3O], source[{"event.dataset": "deprecation.elasticsearch", "@timestamp": "2022-09-11T02:29:30,652-04:00", "log.level": "WARN", "log.logger": "org.elasticsearch.deprecation.rest.RestController", "elasticsearch.cluster.name": "es-paid-campaigns-dup", "elasticsearch.cluster.uuid": "TZAxKhapSH6GwyBWZt92AA", "elasticsearch.node.id": "xkkz9w_PTLm_hvYXI-BlaA", "elasticsearch.node.name": "espc11d-40001-prod-chidc2.chidc2.outbrain.com", "trace.id": "", "message": "Legacy index templates are deprecated in favor of composable templates.", "data_stream.type": "logs", "data_stream.dataset": "deprecation.elasticsearch", "data_stream.namespace": "default", "ecs.version": "1.7", "elasticsearch.event.category": "api", "event.code": "deprecated_route_PUT_/_template/{name}", "elasticsearch.http.request.x_opaque_id": "" }
]}]]]]]
Can you please assist me with figuring out this issue?
And is there a way of disabling this new data-streams? Tried deleting them but they keep being recreated.
I am blocked from upgrading our prod clusters before i figure out this issue.