Disk storage in one of them gets consumed as expected also, but the other remains flat (and there are rejected events in thread pool in most of the data nodes)
I have never run bulk updates so am not sure if errors here would cause the update to be retried from Logstash or simply dropped. You seem to have a lot of time spent on management. Do you have a very large number of shards in the cluster? Are you using dynamic mappings? Does the hardware profiles supporting the cluster s differ, especially with respect to the type of storage used? Is there anything in the Elasticsearch logs?
Are there any error messages in the Elasticsearch logs? Can you try enabling the dead-letter queue to see if this captures any errors that would otherwise be ignored/dropped?
I checked the logs and no errors seem to be there. There's only one that the cluster complains a lot:
[2019-12-18T14:39:44,682][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [arm-or-002_master] collector [cluster_stats] timed out when collecting data
and
[2019-12-18T14:29:24,586][ERROR][o.e.x.m.c.i.IndexStatsCollector] [arm-or-002_master] collector [index-stats] timed out when collecting data
Whenever this is logged, it causes a "blank" patch in the overview section of monitoring of the cluster in Kibana (like cluster is unresponsive during that time of exception)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.