How load balancing work between logstash and elasticsearch

Hi everyone,

I have a question about logstash and how it load balances to an elasticsearch servers cluster.
The warning "high disk watermark [90%] exceeded" is displayed in the logstash log file, and documents are no longersaved in the elasticsearch index.
After checking all the elasticsearch nodes in the cluster, it appears that for 2 nodes, disks are used over 90%. No problem on the other nodes, only 1% is used.
So my question is why logstash does not send documents to the available nodes ? or is it a normal behavior ?
In the logstash configuration file, output to elastic is set as :
output { elasticsearch { hosts => ["host_1:port", "host_2:port", "host_3:port",...."host_n:port"] }}
and communication between Logstash and the elasticsearch cluster is through SSL protocol.

Thx for your help,
Khaled

I think you are really asking about how elasticsearch load balances data across a cluster.

If you are using a single index and it only has a single replica then the data will only be distributed across two elasticsearch nodes. You really need to tell us more about your indexes.

Using the elasticsearch cat indices API might be helpful

GET /_cat/indices/?v=true

Hi Badger,

Thx for your answer.
Here is the result of the request :

health status index                              uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   idx-aggregated-logs-000001 qJJE4BUTQYyj9ExIUegJBQ          1   1  511638025            0    747.4gb        373.7gb

Can you also show the output from GET /_cat/shards please?

Here it is :

.ds-ilm-history-5-2022.08.31-000004                           0 p STARTED                   100.132.166.8  bil192939.prv.cloud
.ds-ilm-history-5-2022.08.31-000004                           0 r STARTED                   100.132.166.6  bil198688.prv.cloud
idx-aggregated-logs-000001                            		  0 r STARTED 511638025 373.6gb 100.132.99.14  bil192488.prv.cloud
idx-aggregated-logs-000001                            		  0 p STARTED 511638025 373.7gb 100.132.99.13  bil196016.prv.cloud
.ds-.logs-deprecation.elasticsearch-default-2022.08.10-000001 0 p STARTED                   100.132.99.12  bil196860.prv.cloud
.ds-.logs-deprecation.elasticsearch-default-2022.08.10-000001 0 r STARTED                   100.132.166.14 bil196062.prv.cloud
.ds-.logs-deprecation.elasticsearch-default-2022.08.24-000002 0 p STARTED                   100.132.99.10  bil195277.prv.cloud
.ds-.logs-deprecation.elasticsearch-default-2022.08.24-000002 0 r STARTED                   100.132.99.8   bil197728.prv.cloud
.ds-ilm-history-5-2022.06.02-000001                           0 p STARTED                   100.132.166.13 bil190257.prv.cloud
.ds-ilm-history-5-2022.06.02-000001                           0 r STARTED                   100.132.99.7   bil193961.prv.cloud
.ds-ilm-history-5-2022.08.01-000003                           0 p STARTED                   100.132.166.6  bil198688.prv.cloud
.ds-ilm-history-5-2022.08.01-000003                           0 r STARTED                   100.132.166.7  bil197047.prv.cloud
.ds-ilm-history-5-2022.07.02-000002                           0 p STARTED                   100.132.166.8  bil192939.prv.cloud
.ds-ilm-history-5-2022.07.02-000002                           0 r STARTED                   100.132.166.7  bil197047.prv.cloud

OK, so you have all your data in one index, which has a primary shared of 374 GB, and one replica. 374 GB is big, the rule of thumb is 10 to 50 GB per shard. You should read this. If I remember correctly you set the number of shards using a template which is applied when your index is created. I believe splitting a shard requires reindexing.

If you go with 50 GB shards for the new index then they will all get written to the nodes that do not contain the existing shards (since they are full). Once you have verified that the reindex worked, you can delete the existing indexes. That will leave those two nodes with no shards one them. I believe the cluster will rebalance the shards but that is definitely an elasticsearch question and not a logstash question, so you direct any questions to that forum.

Thx a lot Badger.
I will read the doc and ask on the elasticsearch forum.

BR,
Khaled

Simply you don't have enough disk space on Elasticsearch nodes. Nothing related to Logstash. Clean up your disk on the node since data is on only one node bil196016. Replica is on bil192488, and must be on other node.
GET _cat/allocation?v
Check which are roles per node and index settings:
GET /_cat/nodes?v
GET /idx-aggregated-logs-000001/_settings - check number_of_shards, I assume it's 1

ES force min 90% free disk space for stable work which make sense since there are logs, temp data... If there are disks with different size on nodes, disk size for ES will be only as the smallest. If that has been fill-up 90% then ES will stop writing.

As Badger said 50 GB is recommended size for the best performances, think about ILM in future.

Hi Rios,

Thank you for this answer.

An ILM policy was already configured for the index with configured rollover max_size 50gb and max_age 365d.

I repost this on the elasticsearch forum. Effectively, my pb has no link with logstash.

BR,
Khaled