How to reduce thread-pool data rejection in elasticsearch cluster?

varun1992 · December 29, 2018, 7:51am

My cluster has 3 nodes

Linux3 - master node (2 core 8 gb ram)
Linux2 - master and data node (4 core 8 gb ram)
Linux1 - data node (2 core 8 gb ram)

cluster having big amound of thread-pool rejection count.

What is the best way to reduce rejection ?

i am getting beat data from 50 servers in every hour. 150,000 events per hour at maximum

Cluster having 136 indices and 520 shards

Why i am having big amount of data rejection ?

i am not seeing heavy utilisation in heap usage and cpu usage

Christian_Dahlqvist · December 29, 2018, 8:31am

Firstly, that does sound like a bad cluster set-up. As Elasticsearch is consensus-based you always want at least 3 master-eligible nodes. Having just two is bad. I would therefore recommend making all nodes master-eligible and also ensure you have set discovery.zen.minimum_master_nodes to 2 as per these guidelines.

Given the amount of RAM and number of data nodes it also looks like you have a somewhat high shard count. If you are indexing into a lot of these, a too high shard count combined with large number of concurrent indexing operations can cause bulk rejections.

varun1992 · December 29, 2018, 9:01am

ok i will add 1 more master node to the cluster.

why did you say i have high shard count ? Is there a recomendation for ram and shard per node ?

i thought 300-500 shard per node is fine

Christian_Dahlqvist · December 29, 2018, 9:06am

The recommended shard count typically depend on how much heap you have. As your nodes seem to have no more than 4GB heap the given numbers sounds a bit high. You can find guidelines and recommendations in the resources I linked to.

varun1992 · December 29, 2018, 9:20am

i didn't understand what is mean by bulk queue size of 200 ?

if i increase ram and heap size , will it solve the problem ?

Christian_Dahlqvist · December 29, 2018, 9:23am

How many of the indices are you actively writing into? What is your average shard size? Are you adding and or updating data in the indices? Are you using the default 5 primary shards per index?

varun1992 · December 29, 2018, 9:28am

usually i am writing to 12 indices / 24 shards at a time. ( data comming from 6 beats installed in 50 servers )

daily i may get 5gb data

i am using 2 shards per indices.

i made number of replica to zero

Christian_Dahlqvist · December 29, 2018, 9:44am

Generally sounds like a bad idea.

I would recommend using a single primary shard if you have such low volumes.

If I calculate correctly, that means you have 300 beats independently sending indexing requests to Elasticsearch, possibly using quite small batch sizes. If reducing the number of primary shards does not help, one way to make indexing more efficient would be to introduce Logstash and let it consolidate all the small batches to a few larger ones, which generally is more efficient. If all nodes have the same amount of disk space you could also make the third node a data node.

varun1992 · December 29, 2018, 9:50am

so you are saying send data from beats to logstash and logstash forward to cluster ?

will elastic have any issue when somany connection comming at a time, as you say 300 beats connecting to it ?

i made replica zero to reduce size.

Christian_Dahlqvist · December 29, 2018, 10:02am

Yes, that might be worth trying.

Not necessarily, as Elasticsearch can handle large number of connections. Indexing using small bulk requests is however generally less efficient than using larger ones, and if more work need to be done indexing due to this, queues may back up. The number of shards you index into also plays a part.

That does indeed reduce space, but removes high availability and increases the risk of the cluster going red.

varun1992 · December 29, 2018, 10:08am

Ok. Thanks for help. I will try to use logstash and also change my replica number to 1.

i thought only primany used normally and replica use for some error condition to shard or when primary is unavailable

Christian_Dahlqvist · December 29, 2018, 10:10am

If a node is lost, access to those shards will be lost and the affected indices will turn red, preventing indexing and potentially leading to data loss.

varun1992 · December 29, 2018, 10:12am

ok i understand. Thanks. Do i need to increase heap size ? 4gb heap size for 3 data nodes, is it enough ?

Christian_Dahlqvist · December 29, 2018, 10:16am

Given the amount of data you seem to have in the cluster I suspect 4Gb heap per node is more than enough.

varun1992 · December 29, 2018, 2:24pm

In our case we will have a condition in the future as below

number of severs where beats installed = 260

Index per day = 6

Master node as of today = 3

Data Node=3

Total nodes = 4 (1 dedicated master node)

Number of end point beat sending data = 1600 ( 260 x 6 )

expected Rate of Events = 3.7millions/hour

If i add logstash for accepting data and send to cluster, will it work fine or do i need to increase cluster size ?

Christian_Dahlqvist · December 29, 2018, 2:44pm

I do not know. Try and see.

varun1992 · December 29, 2018, 3:10pm

can logstash handle it ? or do i need multiple logstash endpoints

Christian_Dahlqvist · December 29, 2018, 3:13pm

That is just over 1000 events per second, which a single Logstash should easily cope with. The only reason you would need 2 instances is if you would like to have 2 for high availability.

varun1992 · December 30, 2018, 1:15pm

Is there any connection between number of shards per index and number of data nodes ?

Do i need to make it same number ? like 3 data nodes => 3 shards ?

Christian_Dahlqvist · December 30, 2018, 1:17pm

No, there is no hard requirement like that. If you have one index that get a lot more data than others, you may want to set this to have the same number of primary shards as you have data nodes, as this will spread the load. Of different indices however get similar load, the load is likely to be evened out anyway even ion all nodes do not have a shard for each index.

Topic		Replies	Views
High Rejections - bulk api Elasticsearch	10	1363	February 20, 2020
Bulk queue_size Elasticsearch	9	12704	July 5, 2017
Exceptions from Elastic search during heavy publishing from logstash Elasticsearch	5	908	December 3, 2018
High bulk rejection on specific nodes Elasticsearch	6	1506	April 19, 2018
Elasticsearch thread pool rejected Elasticsearch	6	1786	April 9, 2020

How to reduce thread-pool data rejection in elasticsearch cluster?

Related topics