Elasticsearch mirrors the Data

Hello,
I have build my first Graylog Cluster based on these Tutorials


Now I have added another dater node, this works also, but I found out it just
mirrors the Data.

What do I have to do, that if the first data node is full it writes to the second data node?

Thanks for help.
Sven

# System Config/Info
All VMs = Ubuntu 20.4.2 LTS
Elasticsearch Version
root@vst-gl-p2:/etc/elasticsearch# curl -XGET 192.168.103.205:9200
{
  "name" : "master-node-2",
  "cluster_name" : "graylog",
  "cluster_uuid" : "oTKbs3GQRsmU_g1I_JGqIQ",
  "version" : {
    "number" : "7.10.2",
    "build_flavor" : "oss",
    "build_type" : "deb",
    "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
    "build_date" : "2021-01-13T00:42:12.435326Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}


# ClusterStatus
root@vst-gl-p2:/etc/elasticsearch# curl -XGET 192.168.103.205:9200/_cluster/health?pretty
{
  "cluster_name" : "graylog",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 68,
  "active_shards" : 68,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}



# First Processing Node
root@vst-gl-p1:/etc/elasticsearch# cat elasticsearch.yml | grep -v '#'
cluster.name: graylog
node.name: master-node-1
node.master: true
node.data: false
path.data: /elastic/data
path.logs: /var/log/elasticsearch
network.host: 192.168.103.204
network.publish_host: 192.168.103.204
http.port: 9200
cluster.initial_master_nodes: ["192.168.103.204", "192.168.103.205", "192.168.103.206", "192.168.103.209"]
discovery.zen.ping.unicast.hosts: ["192.168.103.205", "192.168.103.206"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_nodes: 4
action.auto_create_index: false

# Second Processing Node
root@vst-gl-p2:/etc/elasticsearch# cat elasticsearch.yml | grep -v '#'
cluster.name: graylog
node.name: master-node-2
node.master: true
node.data: false
path.data: /elastic/data
path.logs: /var/log/elasticsearch
network.host: 192.168.103.205
network.publish_host: 192.168.103.205
http.port: 9200
cluster.initial_master_nodes: ["192.168.103.204", "192.168.103.205", "192.168.103.206", "192.168.103.209"]
discovery.zen.ping.unicast.hosts: ["192.168.103.204", "192.168.103.206"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_nodes: 4
action.auto_create_index: false

# First Data Node
root@vst-gl-d1:/etc/elasticsearch# cat elasticsearch.yml | grep -v '#'
cluster.name: graylog
node.name: data-node-1
node.master: false
node.data: true
path.data: /elastic/data
path.logs: /var/log/elasticsearch
network.host: 192.168.103.206
network.publish_host: 192.168.103.206
http.port: 9200
cluster.initial_master_nodes: ["192.168.103.204", "192.168.103.205", "192.168.103.206", "192.168.103.209"]
discovery.zen.ping.unicast.hosts: ["192.168.103.204", "192.168.103.205"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_nodes: 4
action.auto_create_index: false

# Second new Data Node
root@vst-gl-d2:/etc/elasticsearch# cat elasticsearch.yml | grep -v '#'
cluster.name: graylog
node.name: data-node-2
node.master: false
node.data: true
path.data: /elastic/data
path.logs: /var/log/elasticsearch
network.host: 192.168.103.209
network.publish_host: 192.168.103.209
http.port: 9200
cluster.initial_master_nodes: ["192.168.103.204", "192.168.103.205", "192.168.103.206", "192.168.103.209"]
discovery.zen.ping.unicast.hosts: ["192.168.103.204", "192.168.103.205"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_nodes: 4
action.auto_create_index: false

# NCDU output
# First Data Node vst-gl-d1

ncdu 1.14.1 ~ Use the arrow keys to navigate, press ? for help
--- /elastic/data/nodes/0/indices ----------------------------------------------
    5.4 GiB [##########] /kBw-8a8tQ8C2A7u7O3TuKQ
    4.8 GiB [########  ] /yWZj3ZPoSoyOfT3VtHlbvQ
    2.9 GiB [#####     ] /3llHkfWuRlSgYrYFbODWMA
    2.9 GiB [#####     ] /e_U7K6A7RgKqYNvG-qhBIQ
    2.8 GiB [#####     ] /ZC1-IkaoSSOgeFLAAqR2ZQ
    2.8 GiB [#####     ] /C4IFuIKoQ9qLZZZJkLI-oQ
    1.6 GiB [###       ] /TyiSnxRHQnu029XNw1GzFQ
    1.6 GiB [##        ] /E7PCsPb3T_-71EW-z3xK9A
    1.4 GiB [##        ] /p9vkMLSDR86qS9iEmqGhtA
    1.3 GiB [##        ] /wDCtDLa5TDel7opjmXfn1A
    1.0 GiB [#         ] /TI-9432dRwGJ_JvYQ1d6vQ
  864.7 MiB [#         ] /YZQefigaRMKwCkuh-FfRRA
  711.7 MiB [#         ] /ag4c2ooRR22b9YDGnwDsvQ
   84.0 KiB [          ] /6tC2AmKSTa-TlXh8oSVPcg
   84.0 KiB [          ] /VffKpHTaQyiC6g9lPbo6HA
   84.0 KiB [          ] /PUjnQtAPQBq7yuGnofD9oQ
   84.0 KiB [          ] /5woPZPyEQlyEfdgG3P7kpw
   
# Second new Data Node vst-gl-d2
--- /elastic/data/nodes/0/indices --------------------------------
    5.4 GiB [##########] /kBw-8a8tQ8C2A7u7O3TuKQ
    4.8 GiB [########  ] /yWZj3ZPoSoyOfT3VtHlbvQ
    2.9 GiB [#####     ] /3llHkfWuRlSgYrYFbODWMA
    2.9 GiB [#####     ] /e_U7K6A7RgKqYNvG-qhBIQ
    2.8 GiB [#####     ] /ZC1-IkaoSSOgeFLAAqR2ZQ
    2.8 GiB [#####     ] /C4IFuIKoQ9qLZZZJkLI-oQ
    1.6 GiB [###       ] /TyiSnxRHQnu029XNw1GzFQ
    1.6 GiB [##        ] /E7PCsPb3T_-71EW-z3xK9A
    1.4 GiB [##        ] /p9vkMLSDR86qS9iEmqGhtA
    1.4 GiB [##        ] /wDCtDLa5TDel7opjmXfn1A
    1.0 GiB [#         ] /TI-9432dRwGJ_JvYQ1d6vQ
  865.8 MiB [#         ] /YZQefigaRMKwCkuh-FfRRA
  710.5 MiB [#         ] /ag4c2ooRR22b9YDGnwDsvQ
   84.0 KiB [          ] /6tC2AmKSTa-TlXh8oSVPcg
   84.0 KiB [          ] /VffKpHTaQyiC6g9lPbo6HA
   84.0 KiB [          ] /PUjnQtAPQBq7yuGnofD9oQ
   84.0 KiB [          ] /5woPZPyEQlyEfdgG3P7kpw


As you mentioned, the data is shared across both nodes in the cluster. If one fills up, the other one should be full as well. At that stage you need to add another node or delete some data.

If you are not interested in resiliency and availability you can set the number of replicas to 0, which will store only a single copy of each shard across the cluster instead of 2 copies which is the default. I would generally not recommend this as it means you could lose data and not be able to get full results if one node went down.

Hello Christian,
thanks for your answer, It's good to know that the default value is 2 copies,
what setting I need to do, to set the replica to 0.

To prevent data lose, I take backups from the servers.

I have checked it by
https://docs.graylog.org/en/2.3/pages/configuration/index_model.html#index-set-configuration
and Index model — Graylog 4.0.0 documentation

Index replicas: (default: 0) The number of Elasticsearch replicas used per index.

and then I used to confirm the settings on the Index.

and all indexes are set to 0, now I'm confused.

root@vst-gl-p1:/etc/elasticsearch# curl -XGET 192.168.103.204:9200/graylog_*/_settings?pretty | grep number_of_replicas
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8863  100  8863    0     0   101k      0 --:--:-- --:--:-- --:--:--  103k
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
root@vst-gl-p1:/etc/elasticsearch# curl -XGET 192.168.103.205:9200/graylog_*/_settings?pretty | grep number_of_replicas
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8863  100  8863    0     0  26857      0 --:--:-- --:--:-- --:--:-- 26776
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",
root@vst-gl-p1:/etc/elasticsearch# curl -XGET 192.168.103.206:9200/graylog_*/_settings?pretty | grep number_of_replicas
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8863  100  8863    0     0  53071      0        "number_of_replicas" : "0",
 --        "number_of_replicas" : "0",
:-        "number_of_replicas" : "0",
-:        "number_of_replicas" : "0",
--        "number_of_replicas" : "0",
 -        "number_of_replicas" : "0",
-:        "number_of_replicas" : "0",
--:-        "number_of_replicas" : "0",
-        "number_of_replicas" : "0",
         "number_of_replicas" : "0",
--:-        "number_of_replicas" : "0",
-:        "number_of_replicas" : "0",
-        "number_of_replicas" : "0",
- 53071
        "number_of_replicas" : "0",
root@vst-gl-p1:/etc/elasticsearch# curl -XGET 192.168.103.209:9200/graylog_*/_settings?pretty | grep number_of_replicas
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8863  100         "number_of_replicas" : "0",
 88        "number_of_replicas" : "0",
6        "number_of_replicas" : "0",
3          "number_of_replicas" : "0",
          "number_of_replicas" : "0",
0        "number_of_replicas" : "0",
     0         "number_of_replicas" : "0",
 1081        "number_of_replicas" : "0",
k           "number_of_replicas" : "0",
         "number_of_replicas" : "0",
  0        "number_of_replicas" : "0",
 -        "number_of_replicas" : "0",
-:--:-- --:--:-- --:--:-- 1081k
        "number_of_replicas" : "0",
        "number_of_replicas" : "0",

and the latest index

  "graylog_21" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "4",
        "blocks" : {
          "write" : "true",
          "metadata" : "false",
          "read" : "false"
        },
        "provided_name" : "graylog_21",
        "creation_date" : "1623628922871",
        "analysis" : {
          "analyzer" : {
            "analyzer_keyword" : {
              "filter" : "lowercase",
              "tokenizer" : "keyword"
            }
          }
        },
        "number_of_replicas" : "0",
        "uuid" : "TI-9432dRwGJ_JvYQ1d6vQ",
        "version" : {
          "created" : "7100299"
        }
      }
    }
  },

root@vst-gl-p1:/etc/elasticsearch# curl -XGET 192.168.103.204:9200/_cat/indices?v&pretty
[1] 198686
health status index              uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   graylog_14         yWZj3ZPoSoyOfT3VtHlbvQ   4   0   38664921            0      9.5gb          9.5gb
green  open   graylog_13         kBw-8a8tQ8C2A7u7O3TuKQ   4   0   37515726            0     10.7gb         10.7gb
green  open   graylog_16         ag4c2ooRR22b9YDGnwDsvQ   4   0    3503195            0      1.3gb          1.3gb
green  open   graylog_15         wDCtDLa5TDel7opjmXfn1A   4   0    6873898            0      2.7gb          2.7gb
green  open   graylog_18         p9vkMLSDR86qS9iEmqGhtA   4   0    6768017            0      2.7gb          2.7gb
green  open   graylog_17         YZQefigaRMKwCkuh-FfRRA   4   0    4202276            0      1.6gb          1.6gb
green  open   graylog_19         TyiSnxRHQnu029XNw1GzFQ   4   0    8323693            0      3.2gb          3.2gb
green  open   gl-events_1        5woPZPyEQlyEfdgG3P7kpw   4   0          0            0       832b           832b
green  open   gl-events_0        VffKpHTaQyiC6g9lPbo6HA   4   0          0            0       832b           832b
green  open   graylog_9          ZC1-IkaoSSOgeFLAAqR2ZQ   4   0   13346667            0      5.5gb          5.5gb
green  open   gl-system-events_0 6tC2AmKSTa-TlXh8oSVPcg   4   0          0            0       832b           832b
green  open   gl-system-events_1 PUjnQtAPQBq7yuGnofD9oQ   4   0          0            0       832b           832b
green  open   graylog_21         TI-9432dRwGJ_JvYQ1d6vQ   4   0   14507022            0      4.8gb          4.8gb
green  open   graylog_10         C4IFuIKoQ9qLZZZJkLI-oQ   4   0   13066713            0      5.5gb          5.5gb
green  open   graylog_20         E7PCsPb3T_-71EW-z3xK9A   4   0    8070333            0      3.1gb          3.1gb
green  open   graylog_12         e_U7K6A7RgKqYNvG-qhBIQ   4   0   13881198            0      5.8gb          5.8gb
green  open   graylog_11         3llHkfWuRlSgYrYFbODWMA   4   0   14003570            0      5.8gb          5.8gb
green  open   graylog_22         DXGdUtHnSLKLCTdeoz1hMQ   4   0    7737804            0        3gb            3gb

Hello,
today I found this

The path.data settings can be set to multiple paths, in which case all paths will be used to store data (although the files belonging to a single shard will all be stored on the same data path):

path:
data:
- /mnt/elasticsearch_1
- /mnt/elasticsearch_2
- /mnt/elasticsearch_3

Will this solve my problem, or will this also duplicate the data on the other mounts?

Thx Sven

You can add more disk space to your node that way, yes.
However that function is going to be deprecated soon, so I wouldn't use it.

Thanks for the answer. Then what should I use to add more space?

Can you post a link or a description for me?

Thx Sven

Can you add more nodes? Or maybe a node with larger space, then remove the smaller node?

I have a Graylog Server with 3.7 TB Space, that works great.

On the cluster Test, I would prefer to have only one data node,
where I can add more space by putting another hard disk to it,
or add another data node to give more space.

But as I said, if I add another data node, its mirrors the data
and did not use the new data node as next space for the indices
if the other data node is full.

I didn't found a way to config it this way.
(If one data node is full, use the next)

Thx Sven

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.