I am trying to understand the logic behind the static settings.
I am trying to keep all nodes with the same configuration/settings where possible, but I'm curios and want to validate about the right way of doing it.
Let's say I want to tune a custom indices.memory.index_buffer_size (30%) and I have three types of nodes in my cluster:
master nodes (x3)
data node for serving (x5)
data nodes for indexing (x10)
Theoretically, I want that configuration only in the indexing data nodes (3); And that what I did.
I know that this kind of configuration called "cluster_settings" - means it is relevant for the entire cluster and not for a specific node (my intuition).
I am using Cerebro to monitor my cluster. When I enter into the cluster_settings (via Cerebro) and refresh the page, I see this static value "jumping" between 10% (default) to 30% (my custom settings). I guess it depends on the node that answer.
Is that ok? will it ensure that my settings affects the indexing node?
Well, the documentation clearly states that this config must be set on every data node in the cluster.
I'm not sure if setting this with different values in different data nodes class will have some impact in the performance or note, but what you described in your monitoring is consistent with different nodes having different configs.
If you do not set it in the data nodes you use for serving, then they will use the defaulta value, it is a per node setting.
I'm not familiar with Cerebro but this sounds like it's doing something wrong, or at least ill-advised. GET _cluster/settings only shows cluster-wide settings, not node-specific ones. GET _cluster/settings?include_defaults will include the node-level settings from the responding node, but this is mostly only preserved for historical reasons and not all that useful in practice. It (or you) should be using GET _nodes/settings to observe node-level settings.
I am still confused. I have dedicated data nodes for indexing and dedicated data nodes for serving.
I would like this specific setting only in the indexing data nodes. Is it possible?
I see that indexing data nodes changed to 30% and the serving data nodes stayed with the default (10%). Does it mean that it is going to work? because the name "cluster/settings" is confusing (it allows me to provide different settings per node).
I tried the endpoint you suggested GET _cluster/settings?include_defaults and it depends on which node I'm querying from. If I curl from indexing data node I get 30% and if I query from serving data node I get 10%
I am no expert, but I think this is a question about terminology. As I understand there are at least 4 terms that needs to be understood.
Startup settings. These are some of the settings that you put your elasticsearch.yml file. Those are read when the node starts up, but as soon as it connects to a cluster, it will read and use the shared settings that the cluster have agreed on.
Persistent cluster settings. Those are display via the _cluster/settings API. Normally you do not want to change them. If you want to make a temporary change, you change the:
Transient cluster settings, also displayed via the _cluster/settings API.
Finally there are node-local settings, also set in your elasticsearch.yml file. These values are displayed via the _nodes/settings API.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.