Change topic from :Bad logic for allocation shard due to data tier arch -> case with node roles syntax in elastic >8.0.0

INS · March 21, 2022, 8:12am

Hi
Today I observed not expected logic for allocation shards.
Under configuration I've data tier structure like
coordinator nodes - are responsible for taking data traffic
es_data_ssd* nodes - are storing hot phase (ILP was also configured and data node has such hot role)
es_data_hdd* nodes - are taking part for store warm phase
Why Elasticsearch are using all node for allaction data in one time for all nodes even on coordinator?

Elasticsearch build on ver 7.16.1

logstash-data_test_load-2022.03.21                   9     p      STARTED   39956  28945 10.0.9.67  es_data_hdd_2_2
logstash-data_test_load-2022.03.21                   10    p      STARTED   39838  28850 10.0.9.89  es_data_hdd_5_2
logstash-data_test_load-2022.03.21                   7     p      STARTED   39924  28799 10.0.9.44  es_data_ssd_1_1
logstash-data_test_load-2022.03.21                   11    p      STARTED   40027  28948 10.0.9.118 es_data_ssd_1_3
logstash-data_test_load-2022.03.21                   5     p      STARTED   39717  28775 10.0.9.48  es_data_ssd_5_1
logstash-data_test_load-2022.03.21                   12    p      STARTED   40039  28978 10.0.9.120 es_data_hdd_4_3
logstash-data_test_load-2022.03.21                   1     p      STARTED   39897  28970 10.0.9.124 es_data_ssd_5_3
logstash-data_test_load-2022.03.21                   13    p      STARTED   40517  29312 10.0.9.97  es_data_hdd_6_2
logstash-data_test_load-2022.03.21                   2     p      STARTED   39974  28913 10.0.9.79  es_data_ssd_5_2
logstash-data_test_load-2022.03.21                   14    p      STARTED   40114  29002 10.0.9.85  es_data_ssd_1_2
logstash-data_test_load-2022.03.21                   8     p      STARTED   40102  29040 10.0.9.130 es_data_hdd_6_3
logstash-data_test_load-2022.03.21                   6     p      STARTED   39736  28825 10.0.9.56  es_data_hdd_2_1
logstash-data_test_load-2022.03.21                   3     p      STARTED   39874  28838 10.0.9.71  es_data_hdd_4_2
logstash-data_test_load-2022.03.21                   4     p      STARTED   40178  29054 10.0.9.93  es_coordination_2
logstash-data_test_load-2022.03.21                   0     p      STARTED   39873  28892 10.0.9.25  es_coordination_1

name                   node.role
es_coordination_1      cdfhimrstw
es_coordination_2      cdfhimrstw
es_coordination_3      cdfhimrstw
es_data_hdd_1_1        sw
es_data_hdd_1_2        sw
es_data_hdd_1_3        sw
es_data_hdd_2_1        sw
es_data_hdd_2_2        sw
es_data_hdd_2_3        sw
es_data_hdd_3_1        sw
es_data_hdd_3_2        sw
es_data_hdd_3_3        sw
es_data_hdd_4_1        sw
es_data_hdd_4_2        sw
es_data_hdd_4_3        sw
es_data_hdd_5_1        sw
es_data_hdd_5_2        sw
es_data_hdd_5_3        sw
es_data_hdd_6_1        sw
es_data_hdd_6_2        sw
es_data_hdd_6_3        sw
es_data_hdd_7_1        c
es_data_hdd_7_2        c
es_data_hdd_7_3        c
es_data_hdd_8_1        c
es_data_hdd_8_2        c
es_data_hdd_8_3        c
es_data_hdd_9_1        c
es_data_hdd_9_2        c
es_data_hdd_9_3        c
es_data_ssd_1_1        hs
es_data_ssd_1_2        hs
es_data_ssd_1_3        hs
es_data_ssd_2_1        hs
es_data_ssd_2_2        hs
es_data_ssd_2_3        hs
es_data_ssd_3_1        hs
es_data_ssd_3_1_ingest i
es_data_ssd_3_2        hs
es_data_ssd_3_2_ingest i
es_data_ssd_3_3        hs
es_data_ssd_3_3_ingest i
es_data_ssd_4_1        hs
es_data_ssd_4_2        hs
es_data_ssd_4_3        hs
es_data_ssd_5_1        hs
es_data_ssd_5_2        hs
es_data_ssd_5_3        hs
es_master_1_1          m
es_master_1_2          m
es_master_1_3          m
es_master_2_1          m
es_master_2_2          m
es_master_2_3          m

Christian_Dahlqvist · March 21, 2022, 8:35am

Your coordination nodes have all roles assigned and therefore hold data.

The warm data nodes are designated as warm as well as content nodes. Wonder if the contant node status is having an undesired impact here? I would recommend them to just warm nodes.

Yout hot data nodes are also content nodes. I would recommend making them pure hot data nodes and see if that has any impact.

What is the rationale behind having 6 dedicated master nodes? Usually 3 is recommended and sufficient.

INS · March 21, 2022, 9:17am

Thanks for checking tips. So my cluster consists with 3 hosts. On each host I have 18 nodes as You can see in my first comment. Therefore my intention was to have a redundancy cluster for master nodes. Do You have any suggest? this point? It could be one master node per host?
So at least how I should define

- node.roles=
for cooridnator node?

INS · March 21, 2022, 8:59pm

Can someone from Elastic Team Member explain how it should be configure coordinator node from roles perspective (it shouldn't store any data)

leandrojmp · March 21, 2022, 10:35pm

From the documentation link that you shared.

To create a dedicated coordinating node, set:

node.roles: [ ]

So, you need to set the node.roles as an empty array and your node will be a coordinating node only.

INS · March 21, 2022, 10:58pm

ha but it was defined as in documentation and it's store the data? why?

INS · March 22, 2022, 9:14am

coordinator node should be responsible for load balance traffic to others nodes (but not store data)? I'm right?

INS · March 22, 2022, 12:44pm

" If you take away the ability to be able to handle master duties, to hold data, and pre-process documents, then you are left with a coordinating node that can only route requests, handle the search reduce phase, and distribute bulk indexing. Essentially, coordinating only nodes behave as smart load balancers.

Coordinating only nodes can benefit large clusters by offloading the coordinating node role from data and master-eligible nodes. They join the cluster and receive the full cluster state, like every other node, and they use the cluster state to route requests directly to the appropriate place(s)"

Can elastic fix this documentation. It doesn't make sens. I've double check and it brings hold data.
with "- node.roles=

leandrojmp · March 22, 2022, 1:29pm

Please share your entire elasticsearch.yml for your coordinating nodes.

It needs to be set exactly as:

node.roles: []

INS · March 22, 2022, 2:03pm

so it was deploy from yaml file

version: "3.7"

services:
  es_coordination_1:
    container_name: es_coordination_1
    image: priv_repo/elk-docker/elasticsearch/elasticsearch:8.1.0
    user: elasticsearch
    environment:
      - node.name=es_coordination_1
      - xpack.ml.enabled=false
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.http.ssl.key=certs/es_coordination_1/es_coordination_1.key
      - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.http.ssl.certificate=certs/es_coordination_1/es_coordination_1.crt
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.certificate=certs/es_coordination_1/es_coordination_1.crt
      - xpack.security.transport.ssl.key=certs/es_coordination_1/es_coordination_1.key
      - xpack.license.self_generated.type=basic
      - ELASTIC_USERNAME=elastic
      - ELASTIC_PASSWORD=changeme
      - cluster.name=elk_cluster
      - discovery.seed_hosts=es_master_1_1,es_master_2_1,es_master_1_2,es_master_2_2,es_master_1_3,es_master_2_3,es_data_ssd_3_1_ingest,es_data_ssd_1_1,es_data_ssd_2_1,es_data_ssd_3_1,es_data_ssd_4_1,es_data_ssd_5_1,es_data_hdd_1_1,es_data_hdd_2_1,es_data_hdd_3_1,es_data_hdd_4_1,es_data_hdd_5_1,es_data_hdd_6_1,es_data_hdd_7_1,es_data_hdd_8_1,es_data_hdd_9_1,es_coordination_2,es_master_1_2,es_master_2_2,es_data_ssd_3_2_ingest,es_data_ssd_1_2,es_data_ssd_2_2,es_data_ssd_3_2,es_data_ssd_4_2,es_data_ssd_5_2,es_data_hdd_1_2,es_data_hdd_2_2,es_data_hdd_3_2,es_data_hdd_4_2,es_data_hdd_5_2,es_data_hdd_6_2,es_data_hdd_7_2,es_data_hdd_8_2,es_data_hdd_9_2,es_coordination_3,es_master_1_3,es_master_2_3,es_data_ssd_3_3_ingest,es_data_ssd_1_3,es_data_ssd_2_3,es_data_ssd_3_3,es_data_ssd_4_3,es_data_ssd_5_3,es_data_hdd_1_3,es_data_hdd_2_3,es_data_hdd_3_3,es_data_hdd_4_3,es_data_hdd_5_3,es_data_hdd_6_3,es_data_hdd_7_3,es_data_hdd_8_3,es_data_hdd_9_3
      - cluster.initial_master_nodes=es_master_1_1,es_master_2_1,es_master_1_2,es_master_2_2,es_master_1_3,es_master_2_3
      - ingest.geoip.downloader.enabled=false
      - bootstrap.memory_lock=true
      - node.roles=

however this syntax brings not expected configuration for node roles so it must be some workaround for that

INS · March 22, 2022, 2:05pm

github.com/elastic/go-ucfg

Empty arrays are turned into nil

opened 11:18AM - 15 Oct 21 UTC

closed 03:29PM - 10 Nov 21 UTC

pebrc

When parsing empty arrays from YAML to config objects and unpacking into an unty…ped representation like `map[string]interface{}` arrays in the original source are turned into `nil`. Marshalling back to YAML this turns into `null` and is thereby not round-trippeable through ucfg: Consider: ``` node.roles: [] ``` ```golang input, err := yaml.Unmarshal(`node.roles: []`) ... c, err := NewConfig([]byte(input)) ... var output map[string]interface{} err = c.Unpack(output) ... yaml.Marshal(output) ``` leaves us with: ``` node.roles: null ``` We have been hit in the ECK operator a couple of times by this issue: * Kibana Fleet config uses empty arrays to turn off monitoring: https://github.com/elastic/cloud-on-k8s/issues/4853 * Elasticsearch uses empty arrays to express the `coordinating` node role https://github.com/elastic/cloud-on-k8s/issues/3718 We are using untyped `map[string]interface{}` target data structures in the operator for application configuration because it is difficult for us to model each Elastic stack application's configuration completely and for all versions of the Elastic stack that we need to support.

INS · March 22, 2022, 3:19pm

Yes Elastic should fix this bug.
Today we need to use workaround for docker compose in section environment (and also didn't find this tips in documentation)
works for Elasticsearch <7.17.1

workaround:

create docker config for coordinator with legacy settings

node.master: false
node.data: false
node.ingest: false
node.ml: false
node.remote_cluster_client: false

map this config in docker compose

configs:
- source: es-coordination
  target: /usr/share/Elasticsearch/config/Elasticsearch.yml

configs:
es-coordination:
name: es-coordination
file: /home/Elasticsearch/kickstart_elk_cluster/elasticsearch_coordinator.yml

and You will see in the logs
{"type": "deprecation.elasticsearch", "timestamp": "2022-03-22T15:11:18,898Z", "level": "CRITICAL", "component": "o.e.d.n.Node", "cluster.name": "elk_cluster", "node.name": "es_coordination_1", "message": "legacy role settings [node.data, node.remote_cluster_client, node.ingest, node.master, node.ml] are deprecated, use [node.roles=[]]", "key": "legacy role settings", "category": "settings" }

INS · March 22, 2022, 8:41pm

@leandrojmp Can You give an examples how should I define in my docker compose file
node.roles: []

INS · March 30, 2022, 4:05pm

Any suggest how we can resolve that issue over syntax with node.roles: [] in docker compose ?

leandrojmp · March 30, 2022, 4:17pm

I'm sorry, I do not use docker and this seems to be a bug.

I would suggest that you open an issue or comment on the one that you shared, however the issue was closed by this commit, so it should be working.

sastorsl · April 21, 2022, 3:09pm

probably has to be changed to:

- node.roles=[]

INS · April 21, 2022, 3:50pm

It was fixed by [#85186] (Allow yaml values for dynamic node settings by rjernst · Pull Request #85186 · elastic/elasticsearch · GitHub) (for 8.2.0)
but till today it hasn't been released ver of 8.2.0

system · May 19, 2022, 3:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard allocation not working Elasticsearch	3	363	November 4, 2020
Cluster rebalancing Elasticsearch	2	139	May 8, 2024
Incorrect shard allocation Elasticsearch	4	305	September 6, 2021
Data node being overallocated Elasticsearch	11	2389	July 5, 2017
Going from data nodes to hot and warm nodes Elasticsearch	2	691	April 7, 2023

Change topic from :Bad logic for allocation shard due to data tier arch -> case with node roles syntax in elastic >8.0.0

Related topics