org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService

Hi guys, I have this error on elasticsearch. The logs keeps on increasing. Urgently needed help here. Thank you! :slight_smile:

org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService$7@52bbdff1 on EsThreadPoolExecutor[search, queue capacity = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@49c9d421[Running, pool size = 7, active threads = 7, queued tasks = 1000, completed tasks = 7265472

It means your Elasticsearch cluster has too many tasks to handle, your task queue is constantly full and the node will reject new tasks until its task queue drops below 1000 (the default max).

You basically have three choices:

  1. Reduce the workload.
  2. Add more nodes to the cluster to share the workload.
  3. Get stronger hardware.

The first option may not always be feasible, if so you need to grow the clusters work capacity by either adding more nodes or improving the hardware. I see your pool size is just 7 which indicates a 4 CPU hardware. In general, Elasticsearch will use a pool size that is (1.5 x number of cores) + 1. For instance, if you have 24 CPUs the pool size will be 37, giving you 37 worker threads to handle the queued tasks.

How many concurrent queries are you serving? How many shards does each query typically address?

Hi @Bernt_Rostad, sorry I'm kind of new here. How can I add more nodes to the cluster to share workloads? The hardware I have right now is
Intel Xeon E5-2666 v3 (Haswell)
4 vCPU
7.5 Mem

@Christian_Dahlqvist, How can I check the concurrent queries and shards?

Thank you.

How many shards typically match the index pattern your queries are using? You can list shards using the cat shards API.

How are you querying the cluster? Kibana?

If you have X-Pack monitoring installed, this can tell you how many queries the cluster is serving.

I don't have XPACK installed. Here's the output:

filebeat-2017.11.23 2 p STARTED 37617 26.5mb 127.0.0.1 Eq-uD9o
filebeat-2017.11.23 2 r UNASSIGNED
filebeat-2017.11.23 0 p STARTED 37620 26.5mb 127.0.0.1 Eq-uD9o
filebeat-2017.11.23 0 r UNASSIGNED
filebeat-2017.10.06 1 p STARTED 437866 169.4mb 127.0.0.1 Eq-uD9o
filebeat-2017.10.06 1 r UNASSIGNED
filebeat-2017.10.06 3 p STARTED 438124 169.7mb 127.0.0.1 Eq-uD9o
filebeat-2017.10.06 3 r UNASSIGNED
filebeat-2017.10.06 4 p STARTED 437226 168.7mb 127.0.0.1 Eq-uD9o
filebeat-2017.10.06 4 r UNASSIGNED
filebeat-2017.10.06 2 p STARTED 436374 169.2mb 127.0.0.1 Eq-uD9o
filebeat-2017.10.06 2 r UNASSIGNED
filebeat-2017.10.06 0 p STARTED 437640 168.5mb 127.0.0.1 Eq-uD9o
filebeat-2017.10.06 0 r UNASSIGNED
filebeat-2018.01.24 1 p STARTED 58524 46.9mb 127.0.0.1 Eq-uD9o
filebeat-2018.01.24 1 r UNASSIGNED
filebeat-2018.01.24 3 p STARTED 58018 46.1mb 127.0.0.1 Eq-uD9o
filebeat-2018.01.24 3 r UNASSIGNED
filebeat-2018.01.24 4 p STARTED 58608 46.7mb 127.0.0.1 Eq-uD9o
filebeat-2018.01.24 4 r UNASSIGNED
filebeat-2018.01.24 2 p STARTED 58439 46.5mb 127.0.0.1 Eq-uD9o
filebeat-2018.01.24 2 r UNASSIGNED
filebeat-2018.01.24 0 p STARTED 58401 46.5mb 127.0.0.1 Eq-uD9o
filebeat-2018.01.24 0 r UNASSIGNED
filebeat-2017.11.19 1 p STARTED 15609 11mb 127.0.0.1 Eq-uD9o
filebeat-2017.11.19 1 r UNASSIGNED
filebeat-2017.11.19 3 p STARTED 15663 11.2mb 127.0.0.1 Eq-uD9o
filebeat-2017.11.19 3 r UNASSIGNED
filebeat-2017.11.19 4 p STARTED 15969 11.3mb 127.0.0.1 Eq-uD9o
filebeat-2017.11.19 4 r UNASSIGNED
filebeat-2017.11.19 2 p STARTED 15887 11.2mb 127.0.0.1 Eq-uD9o
filebeat-2017.11.19 2 r UNASSIGNED
filebeat-2017.11.19 0 p STARTED 15843 11.2mb 127.0.0.1 Eq-uD9o
filebeat-2017.11.19 0 r UNASSIGNED
filebeat-2017.11.08 1 p STARTED 32759 22.8mb 127.0.0.1 Eq-uD9o
filebeat-2017.11.08 1 r UNASSIGNED
filebeat-2017.11.08 3 p STARTED 32592 22.7mb 127.0.0.1 Eq-uD9o
filebeat-2017.11.08 3 r UNASSIGNED
filebeat-2017.11.08 4 p STARTED 33020 22.9mb 127.0.0.1 Eq-uD9o
filebeat-2017.11.08 4 r UNASSIGNED
filebeat-2017.11.08 2 p STARTED 32819 22.8mb 127.0.0.1 Eq-uD9o
filebeat-2017.11.08 2 r UNASSIGNED
filebeat-2017.11.08 0 p STARTED 32708 22.7mb 127.0.0.1 Eq-uD9o
filebeat-2017.11.08 0 r UNASSIGNED

There's a lot more. thanks!

Can you provide the full output of the cluster stats API? I suspect you have too many shards as your indices seem to use the default 5 shards and go back some time. Please read this blog post with guidelines about shards and sharding and try to reduce you shard count. This should allow you to query longer time periods without addressing too many shards.

Here the output for cluster stats. Btw, I'm using Elasticsearch 5.4.2.

{"_nodes":{"total":1,"successful":1,"failed":0},"cluster_name":"elasticsearch","timestamp":1517228627775,"status":"yellow","indices":{"count":218,"shards":{"total":1086,"primaries":1086,"replication":0.0,"index":{"shards":{"min":1,"max":5,"avg":4.981651376146789},"primaries":{"min":1,"max":5,"avg":4.981651376146789},"replication":{"min":0.0,"max":0.0,"avg":0.0}}},"docs":{"count":110738270,"deleted":431144},"store":{"size_in_bytes":46893135725,"throttle_time_in_millis":0},"fielddata":{"memory_size_in_bytes":0,"evictions":0},"query_cache":{"memory_size_in_bytes":1504624,"total_count":22979719,"hit_count":22218986,"miss_count":760733,"cache_size":24290,"cache_count":25006,"evictions":716},"completion":{"size_in_bytes":0},"segments":{"count":7115,"memory_in_bytes":195825045,"terms_memory_in_bytes":153968445,"stored_fields_memory_in_bytes":15675896,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":7692736,"points_memory_in_bytes":4175228,"doc_values_memory_in_bytes":14312740,"index_writer_memory_in_bytes":28019676,"version_map_memory_in_bytes":220680,"fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":1517222506040,"file_sizes":{}}},"nodes":{"count":{"total":1,"data":1,"coordinating_only":0,"master":1,"ingest":1},"versions":["5.4.2"],"os":{"available_processors":4,"allocated_processors":4,"names":[{"name":"Linux","count":1}],"mem":{"total_in_bytes":7841579008,"free_in_bytes":147542016,"used_in_bytes":7694036992,"free_percent":2,"used_percent":98}},"process":{"cpu":{"percent":45},"open_file_descriptors":{"min":3582,"max":3582,"avg":3582}},"jvm":{"max_uptime_in_millis":6398899,"versions":[{"version":"1.8.0_151","vm_name":"Java HotSpot(TM) 64-Bit Server VM","vm_version":"25.151-b12","vm_vendor":"Oracle Corporation","count":1}],"mem":{"heap_used_in_bytes":3180768432,"heap_max_in_bytes":3186360320},"threads":59},"fs":{"total_in_bytes":104022159360,"free_in_bytes":41403764736,"available_in_bytes":41386987520},"plugins":[],"network_types":{"transport_types":{"netty4":1},"http_types":{"netty4":1}}}}root@ELK:/home/kmelchor# {"_nodes":{"total":1,"successful":1,"failed":0},"cluster_name":"elasticsearch","timestamp":1517228627775,"status":"yellow","indices":{"count":218,"shards":{"total":1086,"primaries":1086,"replication":0.0,"index":{"shards":{"min":1,"max":5,"avg":4.981651376146789},"primaries":{"min":1,"max":5,"avg":4.981651376146789},"replication":{"min":0.0,"max":0.0,"avg":0.0}}},"docs":{"count":110738270,"deleted":431144},"store":{"size":"43.6gb","size_in_bytes":46882229447,"throttle_time":"0s","throttle_time_in_millis":0},"fielddata":{"memory_size":"0b","memory_size_in_bytes":0,"evictions":0},"query_cache":{"memory_size":"1.4mb","memory_size_in_bytes":1504624,"total_count":22979803,"hit_count":22219070,"miss_count":760733,"cache_size":24290,"cache_count":25006,"evictions":716},"completion":{"size":"0b","size_in_bytes":0},"segments":{"count":7115,"memory":"186.7mb","memory_in_bytes":195825045,"terms_memory":"146.8mb","terms_memory_in_bytes":153968445,"stored_fields_memory":"14.9mb","stored_fields_memory_in_bytes":15675896,"term_vectors_memory":"0b","term_vectors_memory_in_bytes":0,"norms_memory":"7.3mb","norms_memory_in_bytes":7692736,"points_memory":"3.9mb","points_memory_in_bytes":4175228,"doc_values_memory":"13.6mb","doc_values_memory_in_bytes":14312740,"index_writer_memory":"26.7mb","index_writer_memory_in_bytes":28020104,"version_map_memory":"217kb","version_map_memory_in_bytes":222219,"fixed_bit_set":"0b","fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":1517222506040,"file_sizes":{}}},"nodes":{"count":{"total":1,"data":1,"coordinating_only":0,"master":1,"ingest":1},"versions":["5.4.2"],"os":{"available_processors":4,"allocated_processors":4,"names":[{"name":"Linux","count":1}],"mem":{"total":"7.3gb","total_in_bytes":7841579008,"free":"134.3mb","free_in_bytes":140886016,"used":"7.1gb","used_in_bytes":7700692992,"free_percent":2,"used_percent":98}},"process":{"cpu":{"percent":26},"open_file_descriptors":{"min":3570,"max":3570,"avg":3570}},"jvm":{"max_uptime":"1.7h","max_uptime_in_millis":6412735,"versions":[{"version":"1.8.0_151","vm_name":"Java HotSpot(TM) 64-Bit Server VM","vm_version":"25.151-b12","vm_vendor":"Oracle Corporation","count":1}],"mem":{"heap_used":"2.9gb","heap_used_in_bytes":3157385112,"heap_max":"2.9gb","heap_max_in_bytes":3186360320},"threads":61},"fs":{"total":"96.8gb","total_in_bytes":104022159360,"free":"38.5gb","free_in_bytes":41400393728,"available":"38.5gb","available_in_bytes":41383616512},"plugins":[],"network_types":{"transport_types":{"netty4":1},"http_types":{"netty4":1}}}}

I'm looking at the post you just sent right now.

Yes, it looks like you have a lot of very small shards, which is causing problems. You should be able to create an index template that sets the number of primary shards to 1 for all new indices and then use the shrink index API to reduce the primary shard count to 1 for all existing indices. It may be worthwhile for you to consider using monthly indices with a single shard.

Will shrinking the shards affect the logs? Because I wanted to keep my logs intact for atleast a year.

It will not affect the data, just reduce the shard count.

Okay. I've created a template. How do I use the shrink Index API to reduce the primary shard count to 1?

{
"filebeat" : {
"order" : 0,
"template" : "filebeat-",
"settings" : {
"index" : {
"number_of_shards" : "1",
"refresh_interval" : "5s"
}
},
"mappings" : {
"default" : {
"dynamic_templates" : [
{
"template1" : {
"mapping" : {
"ignore_above" : 1024,
"index" : "not_analyzed",
"type" : "{dynamic_type}",
"doc_values" : true
},
"match" : "
"
}
}
],
"_all" : {
"norms" : {
"enabled" : false
},
"enabled" : true
},
"properties" : {
"@timestamp" : {
"type" : "date"
},
"geoip" : {
"dynamic" : true,
"type" : "object",
"properties" : {
"location" : {
"type" : "geo_point"
}
}
},
"offset" : {
"type" : "long",
"doc_values" : "true"
},
"message" : {
"index" : "analyzed",
"type" : "string"
}
}
}
},
"aliases" : { }
}
}

Did you look at the documentation I linked to?

Yes, but I'm quite confused on the shrink API part. Btw, did i get the template correctly or I messed it up? Thanks! :slight_smile:

Adding a new node to a cluster is just as easy as starting up a single node, just make sure to use the same cluster.name in the elasticsearch.yml file of the new server and that the other server is listed in the discovery.zen.ping.unicast.hosts field (also in elasticsearch.yml).

As soon as you start up the new Elasticsearch instance it will try to connect to the named cluster by asking the server listed in the discovery host field if such a cluster is available. For how Elasticsearch discovers a cluster have a look at Discovery.

I have a lot of indices, Do I need to shrink the index one by one?

Yes, I believe so, but you should be able to script it. Another option would be to use the reindex API to reindex your daily indices into monthly indices, after which you can simply delete the daily indices.

How will I configure the index pattern on kibana to me new index created?

The filebeat-* index pattern will match both filebeat-2018.11.19 and filebeat-2018.11. While you are creating the new index your data for that month will exist in 2 matching indices, which means you will get incorrect results until it has completed. As it looks like you have quite small indices I would expect the reindex operation to be quite quick though.

So basically I need to do.

  1. Create a template

{
"filebeat" : {
"order" : 0,
"template" : "filebeat-",
"settings" : {
"index" : {
"number_of_shards" : "1",
"refresh_interval" : "5s"
}
},
"mappings" : {
"default" : {
"dynamic_templates" : [
{
"template1" : {
"mapping" : {
"ignore_above" : 1024,
"index" : "not_analyzed",
"type" : "{dynamic_type}",
"doc_values" : true
},
"match" : ""
}
}
],
"_all" : {
"norms" : {
"enabled" : false
},
"enabled" : true
},
"properties" : {
"@timestamp" : {
"type" : "date"
},
"geoip" : {
"dynamic" : true,
"type" : "object",
"properties" : {
"location" : {
"type" : "geo_point"
}
}
},
"offset" : {
"type" : "long",
"doc_values" : "true"
},
"message" : {
"index" : "analyzed",
"type" : "string"
}
}
}
},
"aliases" : { }
}
}

is that correct? How will I know if the for all new indices they will use that template?

  1. Shrink the indices 1 by 1. (My current index is filebeat-YYYY.MM.dd) would it be okay to shrink it with the same index?

  2. Reindex.