Index management and rollover

I have setup ILM and everything was going good until the index didn't roll over like I thought it would. I have it set to 30Gbs or 60 days. My first question is do both requirements have to be met before it will roll over or just one? If it is only one requirement how do I figure out why it isn't rolling over?

Thanks in advance

Are there logs for ILM to see what is going with the rollover?

Thanks

Does anyone have any thoughts?

Here is my new question. My index is in the process of rolling over. I now have 2 indexes, but the 1st index is still getting written too. It says the following: "Action status
Waiting for all shard copies to be active". How do I get it to finish moving over? Some background info. I am moving the index from Hot to Warm and reducing the shards to 1.

Thanks for your help!

Hi Marcell0e,

To answer your first question, only one of the conditions has to be met to roll over.

As for your second question, I think the output of GET /<index>/_ilm/explain?human would help track down what's going on.

After running the command this is the output. The message is Waiting for all shard copies to be active. What do I need to do to get it to finish?

{
  "indices" : {
    "packetbeat-7.3.0-000001" : {
      "index" : "packetbeat-7.3.0-000001",
      "managed" : true,
      "policy" : "packetbeat-7.3.0",
      "lifecycle_date" : "2019-10-13T07:03:42.553Z",
      "lifecycle_date_millis" : 1570950222553,
      "phase" : "warm",
      "phase_time" : "2019-10-13T07:03:43.596Z",
      "phase_time_millis" : 1570950223596,
      "action" : "allocate",
      "action_time" : "2019-10-13T07:13:42.800Z",
      "action_time_millis" : 1570950822800,
      "step" : "check-allocation",
      "step_time" : "2019-10-13T07:13:43.080Z",
      "step_time_millis" : 1570950823080,
      "step_info" : {
        "message" : "Waiting for all shard copies to be active",
        "shards_left_to_allocate" : -1,
        "all_shards_active" : false,
        "actual_replicas" : 1
      },
      "phase_execution" : {
        "policy" : "packetbeat-7.3.0",
        "phase_definition" : {
          "min_age" : "0ms",
          "actions" : {
            "allocate" : {
              "include" : { },
              "exclude" : { },
              "require" : {
                "box_type" : "warm"
              }
            },
            "shrink" : {
              "number_of_shards" : 1
            },
            "set_priority" : {
              "priority" : 30
            }
          }
        },
        "version" : 7,
        "modified_date" : "2019-09-06T17:53:12.509Z",
        "modified_date_in_millis" : 1567792392509
      }
    }
  }

}

Thanks

Marcell0e,

To get information about why a shard isn't ready, have a look at the Cluster allocation explain API.

I have taken a look at the document but don't see the reason for my issue.

    "message" : "Waiting for all shard copies to be active",
    "shards_left_to_allocate" : -1,
    "all_shards_active" : false,
    "actual_replicas" : 1

Any Suggestions?

Marcell0e,

If you follow the documentation to interrogate about an unallocated shard, you will get a response with a node_allocation_decisions that gives the specific explanation.

This is the response on the new index. Not sure how to fix it. I think I have some settings incorrect on the cluster, but not sure how to fix.

{
  "index" : "packetbeat-7.3.0-000002",
  "shard" : 0,
  "primary" : true,
  "current_state" : "started",
  "current_node" : {
    "id" : "0j54YASqTNOnOKXpr4Nz5A",
    "name" : "INDY-LOGSRV01",
    "transport_address" : "10.3.200.45:9300",
    "attributes" : {
      "ml.machine_memory" : "68718940160",
      "xpack.installed" : "true",
      "box_type" : "hot",
      "ml.max_open_jobs" : "20"
    },
    "weight_ranking" : 2
  },
  "can_remain_on_current_node" : "yes",
  "can_rebalance_cluster" : "no",
  "can_rebalance_cluster_decisions" : [
    {
      "decider" : "cluster_rebalance",
      "decision" : "NO",
      "explanation" : "the cluster has unassigned shards and cluster setting [cluster.routing.allocation.allow_rebalance] is set to [indices_all_active]"
    }
  ],
  "can_rebalance_to_other_node" : "no",
  "rebalance_explanation" : "rebalancing is not allowed, even though there is at least one node on which the shard can be allocated",
  "node_allocation_decisions" : [
    {
      "node_id" : "Xzmh5yzfRhOPwRK0gBiujw",
      "node_name" : "INDY-LOGSRV02",
      "transport_address" : "10.3.200.46:9300",
      "node_attributes" : {
        "ml.machine_memory" : "68718940160",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "box_type" : "warm"
      },
      "node_decision" : "yes",
      "weight_ranking" : 1
    },
    {
      "node_id" : "oIyIKckYTAia9O8L8DYFRw",
      "node_name" : "INDY-LOGSRV05",
      "transport_address" : "10.3.200.49:9300",
      "node_attributes" : {
        "ml.machine_memory" : "25756258304",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "box_type" : "cold"
      },
      "node_decision" : "no",
      "weight_ranking" : 2,
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[packetbeat-7.3.0-000002][0], node[oIyIKckYTAia9O8L8DYFRw], [R], s[STARTED], a[id=6NiJNvZUSkejxBwhbNm0zQ]]"
        }
      ]
    }
  ]
}

This is the 1st index response.

{
  "index" : "packetbeat-7.3.0-000001",
  "shard" : 0,
  "primary" : true,
  "current_state" : "started",
  "current_node" : {
    "id" : "Xzmh5yzfRhOPwRK0gBiujw",
    "name" : "INDY-LOGSRV02",
    "transport_address" : "10.3.200.46:9300",
    "attributes" : {
      "ml.machine_memory" : "68718940160",
      "ml.max_open_jobs" : "20",
      "xpack.installed" : "true",
      "box_type" : "warm"
    },
    "weight_ranking" : 1
  },
  "can_remain_on_current_node" : "yes",
  "can_rebalance_cluster" : "no",
  "can_rebalance_cluster_decisions" : [
    {
      "decider" : "rebalance_only_when_active",
      "decision" : "NO",
      "explanation" : "rebalancing is not allowed until all replicas in the cluster are active"
    },
    {
      "decider" : "cluster_rebalance",
      "decision" : "NO",
      "explanation" : "the cluster has unassigned shards and cluster setting [cluster.routing.allocation.allow_rebalance] is set to [indices_all_active]"
    }
  ],
  "can_rebalance_to_other_node" : "no",
  "rebalance_explanation" : "rebalancing is not allowed",
  "node_allocation_decisions" : [
    {
      "node_id" : "oIyIKckYTAia9O8L8DYFRw",
      "node_name" : "INDY-LOGSRV05",
      "transport_address" : "10.3.200.49:9300",
      "node_attributes" : {
        "ml.machine_memory" : "25756258304",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "box_type" : "cold"
      },
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [box_type:"warm"]"""
        }
      ]
    },
    {
      "node_id" : "0j54YASqTNOnOKXpr4Nz5A",
      "node_name" : "INDY-LOGSRV01",
      "transport_address" : "10.3.200.45:9300",
      "node_attributes" : {
        "ml.machine_memory" : "68718940160",
        "xpack.installed" : "true",
        "box_type" : "hot",
        "ml.max_open_jobs" : "20"
      },
      "node_decision" : "no",
      "weight_ranking" : 2,
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [box_type:"warm"]"""
        }
      ]
    }
  ]
}

Tell me if I am wrong. I think my issue is that I have 1 hot node and the index has 1 primary shard and 1 replica shard. The replica is on my Warm node and that is causing the problem. If I create another Hot node will that solve my problem? Is there away to move all the replicas to the new Hot node?

Thank you for your help.

Yes, if you have an index configured to require shard allocation to a particular node type, and there is only one node of that type available, then it isn't possible to allocate any replica shards for that index.

You will note that in this state, with replicas unallocated, no rebalancing can occur (as the explanation says).

@Glen_Smith I have created another Hot Node. Is there anything I need to do to get the shards moved over to the new Hot Node or it will happen in time?

Thank you for your help.

This should happen automatically when you add the node to the cluster, provided allocation is enabled. Do you still have unallocated "hot" replicas?

How do I see unallocated "hot" replicas? I have 2 unallocated shards. Sorry for all the questions. I'm still trying to learn this whole setup.

Thank you

This is what I'm getting now.

{
  "index" : "winlogbeat-7.3.0-000001",
  "shard" : 0,
  "primary" : true,
  "current_state" : "started",
  "current_node" : {
    "id" : "Xzmh5yzfRhOPwRK0gBiujw",
    "name" : "INDY-LOGSRV02",
    "transport_address" : "10.3.200.46:9300",
    "attributes" : {
      "ml.machine_memory" : "68718940160",
      "ml.max_open_jobs" : "20",
      "xpack.installed" : "true",
      "box_type" : "warm"
    },
    "weight_ranking" : 2
  },
  "can_remain_on_current_node" : "yes",
  "can_rebalance_cluster" : "no",
  "can_rebalance_cluster_decisions" : [
    {
      "decider" : "rebalance_only_when_active",
      "decision" : "NO",
      "explanation" : "rebalancing is not allowed until all replicas in the cluster are active"
    },
    {
      "decider" : "cluster_rebalance",
      "decision" : "NO",
      "explanation" : "the cluster has unassigned shards and cluster setting [cluster.routing.allocation.allow_rebalance] is set to [indices_all_active]"
    }
  ],
  "can_rebalance_to_other_node" : "no",
  "rebalance_explanation" : "rebalancing is not allowed",
  "node_allocation_decisions" : [
    {
      "node_id" : "GqP1krHiTX6Iqe0t9bzvaQ",
      "node_name" : "INDY-LOGSRV06",
      "transport_address" : "10.3.200.50:9300",
      "node_attributes" : {
        "ml.machine_memory" : "68718940160",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "box_type" : "hot"
      },
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [box_type:"warm"]"""
        }
      ]
    },
    {
      "node_id" : "oIyIKckYTAia9O8L8DYFRw",
      "node_name" : "INDY-LOGSRV05",
      "transport_address" : "10.3.200.49:9300",
      "node_attributes" : {
        "ml.machine_memory" : "25756258304",
        "ml.max_open_jobs" : "20",
        "xpack.installed" : "true",
        "box_type" : "cold"
      },
      "node_decision" : "no",
      "weight_ranking" : 3,
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [box_type:"warm"]"""
        }
      ]
    },
    {
      "node_id" : "0j54YASqTNOnOKXpr4Nz5A",
      "node_name" : "INDY-LOGSRV01",
      "transport_address" : "10.3.200.45:9300",
      "node_attributes" : {
        "ml.machine_memory" : "68718940160",
        "xpack.installed" : "true",
        "box_type" : "hot",
        "ml.max_open_jobs" : "20"
      },
      "node_decision" : "no",
      "weight_ranking" : 4,
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [box_type:"warm"]"""
        }
      ]
    }
  ]
}

Will this script fix my issue?

PUT _cluster/settings
{ 
  "persistent" : {
    "cluster.routing.allocation.allow_rebalance" : "always"
  }
}

The shard for which you've shown allocation information is allocated already and doesn't need to be moved.

POST _cluster/allocation/explain?include_yes_decisions=true

will get the unallocated shards and all the conditions evaluated to make an allocation decision for them.

After running that command I get the following. Will setting the cluster.routing.allocation.awareness.attributes on each solve my issue? From my little knowledge, I think this is what I am missing.

{
  "index" : "packetbeat-7.3.0-000001",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2019-11-06T20:31:04.071Z",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "0j54YASqTNOnOKXpr4Nz5A",
      "node_name" : "INDY-LOGSRV01",
      "transport_address" : "10.3.200.45:9300",
      "node_attributes" : {
        "ml.machine_memory" : "68718940160",
        "xpack.installed" : "true",
        "box_type" : "hot",
        "ml.max_open_jobs" : "20"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "YES",
          "explanation" : "shard has no previous failures"
        },
        {
          "decider" : "replica_after_primary_active",
          "decision" : "YES",
          "explanation" : "primary shard for this replica is already active"
        },
        {
          "decider" : "enable",
          "decision" : "YES",
          "explanation" : "all allocations are allowed"
        },
        {
          "decider" : "node_version",
          "decision" : "YES",
          "explanation" : "can allocate replica shard to a node with version [7.3.0] since this is equal-or-newer than the primary version [7.3.0]"
        },
        {
          "decider" : "snapshot_in_progress",
          "decision" : "YES",
          "explanation" : "the shard is not being snapshotted"
        },
        {
          "decider" : "restore_in_progress",
          "decision" : "YES",
          "explanation" : "ignored as shard is not being recovered from a snapshot"
        },
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [box_type:"warm"]"""
        },
        {
          "decider" : "same_shard",
          "decision" : "YES",
          "explanation" : "the shard does not exist on the same node"
        },
        {
          "decider" : "disk_threshold",
          "decision" : "YES",
          "explanation" : "enough disk for shard on node, free: [283.8gb], shard size: [0b], free after allocating shard: [283.8gb]"
        },
        {
          "decider" : "throttling",
          "decision" : "YES",
          "explanation" : "below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2]"
        },
        {
          "decider" : "shards_limit",
          "decision" : "YES",
          "explanation" : "total shard limits are disabled: [index: -1, cluster: -1] <= 0"
        },
        {
          "decider" : "awareness",
          "decision" : "YES",
          "explanation" : "allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it"
        }
      ]
    }       

Thank you for your help.

Did your response body get cut off? That's not the complete response.
What can be determined from the partial response:

This part should be self explanatory

...
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
...

The response should then walk through all the nodes with allocation explanations.
The partial above, though, only has a single node. It's a hot node, and the index is warm.

...
      "node_name" : "INDY-LOGSRV01",
      "transport_address" : "10.3.200.45:9300",
      "node_attributes" : {
        "ml.machine_memory" : "68718940160",
        "xpack.installed" : "true",
        "box_type" : "hot",
        "ml.max_open_jobs" : "20"
      },
...
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : """node does not match index setting [index.routing.allocation.require] filters [box_type:"warm"]"""
        },
...

If the response is too large to post here in its entirety, try omitting the ?include_yes_decisions=true. It's nice to see all the deciders, but the false one is most important.

Also, can you check the size of index packetbeat-7.3.0-000001 ?