Change topic from :Bad logic for allocation shard due to data tier arch -> case with node roles syntax in elastic >8.0.0

Hi
Today I observed not expected logic for allocation shards.
Under configuration I've data tier structure like
coordinator nodes - are responsible for taking data traffic
es_data_ssd* nodes - are storing hot phase (ILP was also configured and data node has such hot role)
es_data_hdd* nodes - are taking part for store warm phase
Why Elasticsearch are using all node for allaction data in one time for all nodes even on coordinator?

Elasticsearch build on ver 7.16.1

logstash-data_test_load-2022.03.21                   9     p      STARTED   39956  28945 10.0.9.67  es_data_hdd_2_2
logstash-data_test_load-2022.03.21                   10    p      STARTED   39838  28850 10.0.9.89  es_data_hdd_5_2
logstash-data_test_load-2022.03.21                   7     p      STARTED   39924  28799 10.0.9.44  es_data_ssd_1_1
logstash-data_test_load-2022.03.21                   11    p      STARTED   40027  28948 10.0.9.118 es_data_ssd_1_3
logstash-data_test_load-2022.03.21                   5     p      STARTED   39717  28775 10.0.9.48  es_data_ssd_5_1
logstash-data_test_load-2022.03.21                   12    p      STARTED   40039  28978 10.0.9.120 es_data_hdd_4_3
logstash-data_test_load-2022.03.21                   1     p      STARTED   39897  28970 10.0.9.124 es_data_ssd_5_3
logstash-data_test_load-2022.03.21                   13    p      STARTED   40517  29312 10.0.9.97  es_data_hdd_6_2
logstash-data_test_load-2022.03.21                   2     p      STARTED   39974  28913 10.0.9.79  es_data_ssd_5_2
logstash-data_test_load-2022.03.21                   14    p      STARTED   40114  29002 10.0.9.85  es_data_ssd_1_2
logstash-data_test_load-2022.03.21                   8     p      STARTED   40102  29040 10.0.9.130 es_data_hdd_6_3
logstash-data_test_load-2022.03.21                   6     p      STARTED   39736  28825 10.0.9.56  es_data_hdd_2_1
logstash-data_test_load-2022.03.21                   3     p      STARTED   39874  28838 10.0.9.71  es_data_hdd_4_2
logstash-data_test_load-2022.03.21                   4     p      STARTED   40178  29054 10.0.9.93  es_coordination_2
logstash-data_test_load-2022.03.21                   0     p      STARTED   39873  28892 10.0.9.25  es_coordination_1
name                   node.role
es_coordination_1      cdfhimrstw
es_coordination_2      cdfhimrstw
es_coordination_3      cdfhimrstw
es_data_hdd_1_1        sw
es_data_hdd_1_2        sw
es_data_hdd_1_3        sw
es_data_hdd_2_1        sw
es_data_hdd_2_2        sw
es_data_hdd_2_3        sw
es_data_hdd_3_1        sw
es_data_hdd_3_2        sw
es_data_hdd_3_3        sw
es_data_hdd_4_1        sw
es_data_hdd_4_2        sw
es_data_hdd_4_3        sw
es_data_hdd_5_1        sw
es_data_hdd_5_2        sw
es_data_hdd_5_3        sw
es_data_hdd_6_1        sw
es_data_hdd_6_2        sw
es_data_hdd_6_3        sw
es_data_hdd_7_1        c
es_data_hdd_7_2        c
es_data_hdd_7_3        c
es_data_hdd_8_1        c
es_data_hdd_8_2        c
es_data_hdd_8_3        c
es_data_hdd_9_1        c
es_data_hdd_9_2        c
es_data_hdd_9_3        c
es_data_ssd_1_1        hs
es_data_ssd_1_2        hs
es_data_ssd_1_3        hs
es_data_ssd_2_1        hs
es_data_ssd_2_2        hs
es_data_ssd_2_3        hs
es_data_ssd_3_1        hs
es_data_ssd_3_1_ingest i
es_data_ssd_3_2        hs
es_data_ssd_3_2_ingest i
es_data_ssd_3_3        hs
es_data_ssd_3_3_ingest i
es_data_ssd_4_1        hs
es_data_ssd_4_2        hs
es_data_ssd_4_3        hs
es_data_ssd_5_1        hs
es_data_ssd_5_2        hs
es_data_ssd_5_3        hs
es_master_1_1          m
es_master_1_2          m
es_master_1_3          m
es_master_2_1          m
es_master_2_2          m
es_master_2_3          m

Your coordination nodes have all roles assigned and therefore hold data.

The warm data nodes are designated as warm as well as content nodes. Wonder if the contant node status is having an undesired impact here? I would recommend them to just warm nodes.

Yout hot data nodes are also content nodes. I would recommend making them pure hot data nodes and see if that has any impact.

What is the rationale behind having 6 dedicated master nodes? Usually 3 is recommended and sufficient.

Thanks for checking tips. So my cluster consists with 3 hosts. On each host I have 18 nodes as You can see in my first comment. Therefore my intention was to have a redundancy cluster for master nodes. Do You have any suggest? this point? It could be one master node per host?
So at least how I should define

- node.roles=
for cooridnator node?

Can someone from Elastic Team Member explain how it should be configure coordinator node from roles perspective (it shouldn't store any data)

From the documentation link that you shared.

To create a dedicated coordinating node, set:

node.roles: [ ]

So, you need to set the node.roles as an empty array and your node will be a coordinating node only.

ha but it was defined as in documentation and it's store the data? why?

coordinator node should be responsible for load balance traffic to others nodes (but not store data)? I'm right?

" If you take away the ability to be able to handle master duties, to hold data, and pre-process documents, then you are left with a coordinating node that can only route requests, handle the search reduce phase, and distribute bulk indexing. Essentially, coordinating only nodes behave as smart load balancers.

Coordinating only nodes can benefit large clusters by offloading the coordinating node role from data and master-eligible nodes. They join the cluster and receive the full cluster state, like every other node, and they use the cluster state to route requests directly to the appropriate place(s)"

Can elastic fix this documentation. It doesn't make sens. I've double check and it brings hold data.
with "- node.roles=

Please share your entire elasticsearch.yml for your coordinating nodes.

It needs to be set exactly as:

node.roles: []

so it was deploy from yaml file

version: "3.7"

services:
  es_coordination_1:
    container_name: es_coordination_1
    image: priv_repo/elk-docker/elasticsearch/elasticsearch:8.1.0
    user: elasticsearch
    environment:
      - node.name=es_coordination_1
      - xpack.ml.enabled=false
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.http.ssl.key=certs/es_coordination_1/es_coordination_1.key
      - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.http.ssl.certificate=certs/es_coordination_1/es_coordination_1.crt
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.certificate=certs/es_coordination_1/es_coordination_1.crt
      - xpack.security.transport.ssl.key=certs/es_coordination_1/es_coordination_1.key
      - xpack.license.self_generated.type=basic
      - ELASTIC_USERNAME=elastic
      - ELASTIC_PASSWORD=changeme
      - cluster.name=elk_cluster
      - discovery.seed_hosts=es_master_1_1,es_master_2_1,es_master_1_2,es_master_2_2,es_master_1_3,es_master_2_3,es_data_ssd_3_1_ingest,es_data_ssd_1_1,es_data_ssd_2_1,es_data_ssd_3_1,es_data_ssd_4_1,es_data_ssd_5_1,es_data_hdd_1_1,es_data_hdd_2_1,es_data_hdd_3_1,es_data_hdd_4_1,es_data_hdd_5_1,es_data_hdd_6_1,es_data_hdd_7_1,es_data_hdd_8_1,es_data_hdd_9_1,es_coordination_2,es_master_1_2,es_master_2_2,es_data_ssd_3_2_ingest,es_data_ssd_1_2,es_data_ssd_2_2,es_data_ssd_3_2,es_data_ssd_4_2,es_data_ssd_5_2,es_data_hdd_1_2,es_data_hdd_2_2,es_data_hdd_3_2,es_data_hdd_4_2,es_data_hdd_5_2,es_data_hdd_6_2,es_data_hdd_7_2,es_data_hdd_8_2,es_data_hdd_9_2,es_coordination_3,es_master_1_3,es_master_2_3,es_data_ssd_3_3_ingest,es_data_ssd_1_3,es_data_ssd_2_3,es_data_ssd_3_3,es_data_ssd_4_3,es_data_ssd_5_3,es_data_hdd_1_3,es_data_hdd_2_3,es_data_hdd_3_3,es_data_hdd_4_3,es_data_hdd_5_3,es_data_hdd_6_3,es_data_hdd_7_3,es_data_hdd_8_3,es_data_hdd_9_3
      - cluster.initial_master_nodes=es_master_1_1,es_master_2_1,es_master_1_2,es_master_2_2,es_master_1_3,es_master_2_3
      - ingest.geoip.downloader.enabled=false
      - bootstrap.memory_lock=true
      - node.roles= 

however this syntax brings not expected configuration for node roles so it must be some workaround for that

Yes Elastic should fix this bug.
Today we need to use workaround for docker compose in section environment (and also didn't find this tips in documentation)
works for Elasticsearch <7.17.1

workaround:

  1. create docker config for coordinator with legacy settings
node.master: false
node.data: false
node.ingest: false
node.ml: false
node.remote_cluster_client: false
  1. map this config in docker compose

    configs:

    • source: es-coordination
      target: /usr/share/Elasticsearch/config/Elasticsearch.yml

configs:
es-coordination:
name: es-coordination
file: /home/Elasticsearch/kickstart_elk_cluster/elasticsearch_coordinator.yml

and You will see in the logs
{"type": "deprecation.elasticsearch", "timestamp": "2022-03-22T15:11:18,898Z", "level": "CRITICAL", "component": "o.e.d.n.Node", "cluster.name": "elk_cluster", "node.name": "es_coordination_1", "message": "legacy role settings [node.data, node.remote_cluster_client, node.ingest, node.master, node.ml] are deprecated, use [node.roles=[]]", "key": "legacy role settings", "category": "settings" }

@leandrojmp Can You give an examples how should I define in my docker compose file
node.roles: []

Any suggest how we can resolve that issue over syntax with node.roles: [] in docker compose ?

I'm sorry, I do not use docker and this seems to be a bug.

I would suggest that you open an issue or comment on the one that you shared, however the issue was closed by this commit, so it should be working.

probably has to be changed to:

- node.roles=[]

It was fixed by [#85186] (Allow yaml values for dynamic node settings by rjernst · Pull Request #85186 · elastic/elasticsearch · GitHub) (for 8.2.0)
but till today it hasn't been released ver of 8.2.0

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.