Elasticsearch cluster on kubernetes is highly unstable

kumud.t · June 16, 2017, 1:31pm

We have a three node elasticsearch cluster running on a kubernetes 1.6.4 cluster. We have spanned a dedicated AWS r4.large only for the 3 elasticsearch containers.
We are experiencing many issues with this setup.
We randomly get the below errors and the cluster nodes will restart and then the es cluster will go into an unusable state.

[2017-06-16T13:19:46,176][ERROR][o.e.a.b.TransportBulkAction] [elasticsearch-logging-0] failed to execute pipeline for a bulk request
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.ingest.PipelineExecutionService$2@22924ec0 on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@adc82c9[Running, pool size = 2, active threads = 2, queued tasks = 265, completed tasks = 1450]]
	at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:50) ~[elasticsearch-5.4.1.jar:5.4.1]
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) ~[?:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) ~[?:1.8.0_131]
	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.doExecute(EsThreadPoolExecutor.java:94) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:89) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.ingest.PipelineExecutionService.executeBulkRequest(PipelineExecutionService.java:74) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.action.bulk.TransportBulkAction.processBulkIndexIngestRequest(TransportBulkAction.java:508) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:136) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:85) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:170) ~[elasticsearch-5.4.1.jar:5.4.1]
	at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:149) ~[?:?]

Can anyone help in fixing this up?

PS.
During these errors CPU usage is high on each nodes and as a result the nodes are getting restarted, after which the cluster will never stabilize (some data shards check I guess) and the nodes will keep on restarting. We are using fluentd to push kubernetes container logs to elasticsearch. We are facing this issue for a long time. Once the cluster is up and running it will run forever without any issues until a node gets restarted.

system · July 14, 2017, 1:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Failed to execute pipeline for a bulk request Elasticsearch	2	6203	July 19, 2018
Failed to execute pipeline for a bulk using ingest node Elasticsearch	7	5828	April 24, 2017
Elasticsearch bulk rejection error in logstash logs Elasticsearch	6	1313	November 19, 2020
Failed to execute bulk item error in elasticsearch pods Elasticsearch	3	2392	May 4, 2020
Failed to execute pipeline for a bulk request but low CPU and memory Elasticsearch	1	1447	December 5, 2017

Elasticsearch cluster on kubernetes is highly unstable

Related topics