Firstly, that does sound like a bad cluster set-up. As Elasticsearch is consensus-based you always want at least 3 master-eligible nodes. Having just two is bad. I would therefore recommend making all nodes master-eligible and also ensure you have set discovery.zen.minimum_master_nodes to 2 as per these guidelines.
The recommended shard count typically depend on how much heap you have. As your nodes seem to have no more than 4GB heap the given numbers sounds a bit high. You can find guidelines and recommendations in the resources I linked to.
How many of the indices are you actively writing into? What is your average shard size? Are you adding and or updating data in the indices? Are you using the default 5 primary shards per index?
I would recommend using a single primary shard if you have such low volumes.
If I calculate correctly, that means you have 300 beats independently sending indexing requests to Elasticsearch, possibly using quite small batch sizes. If reducing the number of primary shards does not help, one way to make indexing more efficient would be to introduce Logstash and let it consolidate all the small batches to a few larger ones, which generally is more efficient. If all nodes have the same amount of disk space you could also make the third node a data node.
Not necessarily, as Elasticsearch can handle large number of connections. Indexing using small bulk requests is however generally less efficient than using larger ones, and if more work need to be done indexing due to this, queues may back up. The number of shards you index into also plays a part.
That does indeed reduce space, but removes high availability and increases the risk of the cluster going red.
If a node is lost, access to those shards will be lost and the affected indices will turn red, preventing indexing and potentially leading to data loss.
That is just over 1000 events per second, which a single Logstash should easily cope with. The only reason you would need 2 instances is if you would like to have 2 for high availability.
No, there is no hard requirement like that. If you have one index that get a lot more data than others, you may want to set this to have the same number of primary shards as you have data nodes, as this will spread the load. Of different indices however get similar load, the load is likely to be evened out anyway even ion all nodes do not have a shard for each index.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.