Hi,
I apologize for the large number of edits, I had some trouble using the code blocks since I didn't perceive that an extra carriage return needed to precede the preformatted text.
I have a small elasticsearch cluster that had to be restored from backup, after which I have run into some index issues. I have a private, replica index that is stuck in INITIALIZING, ALLOCATION FAILED status. We can call this issue 1.
Issue 2 is that there are several indices named like ".monitoring-es-7-2019.07.XX" which are UNASSIGNED, CLUSTER_RECOVERED. These, I feel may be rotated out after 15 days or so based on my research, but I wouldn't mind being able to clear this up myself today if possible. I don't think I really need these so they can be deleted if I can find a way, as they are not listed in managed indices in Kibana, I expect because they are hidden/system indices.
GET /_cluster/allocation/explain?pretty
gives some pretty nice data, but only for Issue 2.
"allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt"
Issue 1 looks more like this:
{
"error": {
"root_cause": [
{
"type": "circuit_breaking_exception",
"reason": "[parent] Data too large, data for [<transport_request>] would be [1027222334/979.6mb], which is larger than the limit of [986061209/940.3mb], real usage: [1027222120/979.6mb], new bytes reserved: [214/214b]",
"bytes_wanted": 1027222334,
"bytes_limit": 986061209,
"durability": "PERMANENT"
}
],
"type": "circuit_breaking_exception",
"reason": "[parent] Data too large, data for [<transport_request>] would be [1027222334/979.6mb], which is larger than the limit of [986061209/940.3mb], real usage: [1027222120/979.6mb], new bytes reserved: [214/214b]",
"bytes_wanted": 1027222334,
"bytes_limit": 986061209,
"durability": "PERMANENT"
},
"status": 429
}
I have done some research on this "Data too large" and [<transport_request>], but all the topics point to errors that don't seem to match my use-case. I've also been trying to understand these limits, as I had 8GB allocated to these VMs, but expanded to 16GB and the limit remains at 940.3mb. Should I change this to match my environment more realistically or is there a different issue impacting me here? It's also interesting to me that I see these same errors in logstash.log, which I thought would have nothing to do with the indexing portion.
W/o ?pretty (networking data redacted):
{
"index" : "private_index",
"shard" : 0,
"primary" : true,
"current_state" : "started",
"current_node" : {
"id" : "ihTj7kz6QQewpWM-FMo6CA",
"name" : "node1",
"transport_address" : "node1_ip:9300",
"attributes" : {
"ml.machine_memory" : "16656506880",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"weight_ranking" : 1
},
"can_remain_on_current_node" : "yes",
"can_rebalance_cluster" : "no",
"can_rebalance_cluster_decisions" : [
{
"decider" : "rebalance_only_when_active",
"decision" : "NO",
"explanation" : "rebalancing is not allowed until all replicas in the cluster are active"
},
{
"decider" : "cluster_rebalance",
"decision" : "NO",
"explanation" : "the cluster has unassigned shards and cluster setting [cluster.routing.allocation.allow_rebalance] is set to [indices_all_active]"
}
],
"can_rebalance_to_other_node" : "no",
"rebalance_explanation" : "rebalancing is not allowed",
"node_allocation_decisions" : [
{
"node_id" : "7LIgwS6NQEuF0H8S-lLZCA",
"node_name" : "node2",
"transport_address" : "node2_ip:9300",
"node_attributes" : {
"ml.machine_memory" : "16656506880",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true"
},
"node_decision" : "no",
"weight_ranking" : 1,
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[private_index][0], node[7LIgwS6NQEuF0H8S-lLZCA], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=iOPv9-x_R26AMQQxMbyF_A], unassigned_info[[reason=ALLOCATION_FAILED], at[2019-07-10T16:39:15.841Z], failed_attempts[1], delayed=false, details[failed shard on node [7LIgwS6NQEuF0H8S-lLZCA]: master {node1}{ihTj7kz6QQewpWM-FMo6CA}{O6ARr-ijQSuaCDIYQ6phFQ}{node1_ip}{node1_ip:9300}{ml.machine_memory=16656506880, ml.max_open_jobs=20, xpack.installed=true} has not removed previously failed shard. resending shard failure], allocation_status[no_attempt]]]"
}
]
}
]
}