How to reduce thread-pool data rejection in elasticsearch cluster?

My cluster has 3 nodes

Linux3 - master node (2 core 8 gb ram)
Linux2 - master and data node (4 core 8 gb ram)
Linux1 - data node (2 core 8 gb ram)

cluster having big amound of thread-pool rejection count.

What is the best way to reduce rejection ?

i am getting beat data from 50 servers in every hour. 150,000 events per hour at maximum

Cluster having 136 indices and 520 shards

Why i am having big amount of data rejection ?

i am not seeing heavy utilisation in heap usage and cpu usage

Firstly, that does sound like a bad cluster set-up. As Elasticsearch is consensus-based you always want at least 3 master-eligible nodes. Having just two is bad. I would therefore recommend making all nodes master-eligible and also ensure you have set discovery.zen.minimum_master_nodes to 2 as per these guidelines.

Given the amount of RAM and number of data nodes it also looks like you have a somewhat high shard count. If you are indexing into a lot of these, a too high shard count combined with large number of concurrent indexing operations can cause bulk rejections.

ok i will add 1 more master node to the cluster.

why did you say i have high shard count ? Is there a recomendation for ram and shard per node ?

i thought 300-500 shard per node is fine

The recommended shard count typically depend on how much heap you have. As your nodes seem to have no more than 4GB heap the given numbers sounds a bit high. You can find guidelines and recommendations in the resources I linked to.

i didn't understand what is mean by bulk queue size of 200 ?

if i increase ram and heap size , will it solve the problem ?

How many of the indices are you actively writing into? What is your average shard size? Are you adding and or updating data in the indices? Are you using the default 5 primary shards per index?

usually i am writing to 12 indices / 24 shards at a time. ( data comming from 6 beats installed in 50 servers )

daily i may get 5gb data

i am using 2 shards per indices.

i made number of replica to zero

Generally sounds like a bad idea.

I would recommend using a single primary shard if you have such low volumes.

If I calculate correctly, that means you have 300 beats independently sending indexing requests to Elasticsearch, possibly using quite small batch sizes. If reducing the number of primary shards does not help, one way to make indexing more efficient would be to introduce Logstash and let it consolidate all the small batches to a few larger ones, which generally is more efficient. If all nodes have the same amount of disk space you could also make the third node a data node.

so you are saying send data from beats to logstash and logstash forward to cluster ?

will elastic have any issue when somany connection comming at a time, as you say 300 beats connecting to it ?

i made replica zero to reduce size.

Yes, that might be worth trying.

Not necessarily, as Elasticsearch can handle large number of connections. Indexing using small bulk requests is however generally less efficient than using larger ones, and if more work need to be done indexing due to this, queues may back up. The number of shards you index into also plays a part.

That does indeed reduce space, but removes high availability and increases the risk of the cluster going red.

Ok. Thanks for help. I will try to use logstash and also change my replica number to 1.

i thought only primany used normally and replica use for some error condition to shard or when primary is unavailable

If a node is lost, access to those shards will be lost and the affected indices will turn red, preventing indexing and potentially leading to data loss.

ok i understand. Thanks. Do i need to increase heap size ? 4gb heap size for 3 data nodes, is it enough ?

Given the amount of data you seem to have in the cluster I suspect 4Gb heap per node is more than enough.

In our case we will have a condition in the future as below

number of severs where beats installed = 260

Index per day = 6

Master node as of today = 3

Data Node=3

Total nodes = 4 (1 dedicated master node)

Number of end point beat sending data = 1600 ( 260 x 6 )

expected Rate of Events = 3.7millions/hour

If i add logstash for accepting data and send to cluster, will it work fine or do i need to increase cluster size ?

I do not know. Try and see.

can logstash handle it ? or do i need multiple logstash endpoints

That is just over 1000 events per second, which a single Logstash should easily cope with. The only reason you would need 2 instances is if you would like to have 2 for high availability.

Is there any connection between number of shards per index and number of data nodes ?

Do i need to make it same number ? like 3 data nodes => 3 shards ?

No, there is no hard requirement like that. If you have one index that get a lot more data than others, you may want to set this to have the same number of primary shards as you have data nodes, as this will spread the load. Of different indices however get similar load, the load is likely to be evened out anyway even ion all nodes do not have a shard for each index.