Hot/Warm Index Allocation Failing due to Other Shard Allocation

I have a hot/warm architecture in ES 5.6.2. This architecture is in place for certain high IO indices; primarily Nginx indices. After two days, I use Curator to update the index requirements from:

            "require": {
              "box_type": "hot"
            }

to:

            "require": {
              "box_type": "warm"
            }

to force the indices to be allocated away from my precious SSDs and onto spinning disks. This has worked fantastically up until recently when I lost a node and the whole cluster rebalanced. Now I have other indexes taking allocation priority over my Nginx indexes. This causes the SSDs to fill up and ingestion comes to a grinding halt.

Does anyone know how to prioritize the allocation of these indices? More importantly, I'm looking to prioritize the allocation of hot => warm Nginx indices, which are also indices created within the last few days. I'm looking at the shard balancing settings, but unsure if I can do anything except make all Nginx indices the highest priority (Nginx indices on HDDs; "box_type": "warm" should be lower priority than Nginx indices on SSDs; "box_type": "hot").

When I run curator: curator --config curator.yml allocate-nginx-3day.yml, I get the following trace:

2018-01-23 10:17:56,407 DEBUG              curator.utils           health_check:1492 KWARGS= "{'relocating_shards': 0}"
2018-01-23 10:17:56,418 DEBUG              curator.utils           health_check:1506 NO MATCH: Value for key "0", health check data: 2
2018-01-23 10:17:56,418 DEBUG              curator.utils            wait_for_it:1726 Response: False
2018-01-23 10:17:56,418 DEBUG              curator.utils            wait_for_it:1746 Action "allocation" not yet complete, 1015 total seconds elapsed. Waiting 3 seconds before checking again.
2018-01-23 10:17:59,419 DEBUG              curator.utils            wait_for_it:1723 Elapsed time: 1018 seconds

My curator config:

---
client:
  hosts:
    - 127.0.0.1
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  http_auth:
  timeout: 30
  master_only: False

logging:
  loglevel: DEBUG
  logfile:
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

Allocation script:

actions:
  1:
    action: allocation
    description: "Apply shard allocation filtering rules to the specified indices"
    options:
      key: box_type
      value: warm
      allocation_type: require
      wait_for_completion: true
      timeout_override:
      continue_if_exception: false
      disable_action: false
    filters:
    - filtertype: pattern
      kind: prefix
      value: nginx-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 2

Cluster settings:

{
  "persistent": {
    "cluster": {
      "routing": {
        "allocation": {
          "allow_rebalance": "always",
          "cluster_concurrent_rebalance": "2",
          "node_concurrent_recoveries": "2",
          "disk": {
            "watermark": {
              "low": "95%",
              "high": "98%"
            }
          }
        }
      }
    }
  },
  "transient": {}
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.