Marvel.agent.exprorer create failure kills cluster

Volodymyr_Bilyachat · April 13, 2016, 11:24am

Hi all,
I have huge problem that when marvel cant create index it actually kills cluster.
I am using version 1.4.5 and last month we have strange issue when almost everyweek cluster dies. Each time there is in logs

[2016-04-13 12:58:48,230][ERROR][marvel.agent.exporter ] [Node1] create failure (index:[.marvel-2016.04.13] type: [index_stats]): RemoteTransportException[[Node4][inet[/IP:9300]][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsea$
[2016-04-13 12:58:48,230][ERROR][marvel.agent.exporter ] [Node1] create failure (index:[.marvel-2016.04.13] type: [index_stats]): RemoteTransportException[[Node4][inet[/IP:9300]][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsea$
[2016-04-13 12:58:48,230][ERROR][marvel.agent.exporter ] [Node1] create failure (index:[.marvel-2016.04.13] type: [index_stats]): RemoteTransportException[[Node4][inet[/IP:9300]][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsea$
[2016-04-13 12:58:48,230][ERROR][marvel.agent.exporter ] [Node1] create failure (index:[.marvel-2016.04.13] type: [index_stats]): RemoteTransportException[[Node4][inet[/IP:9300]][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsea$
[2016-04-13 12:58:48,230][ERROR][marvel.agent.exporter ] [Node1] create failure (index:[.marvel-2016.04.13] type: [index_stats]): RemoteTransportException[[Node4][inet[/IP:9300]][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsea$
[2016-04-13 12:58:48,230][ERROR][marvel.agent.exporter ] [Node1] create failure (index:[.marvel-2016.04.13] type: [index_stats]): RemoteTransportException[[Node4][inet[/IP:9300]][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsea$
[2016-04-13 12:58:48,230][ERROR][marvel.agent.exporter ] [Node1] create failure (index:[.marvel-2016.04.13] type: [index_stats]): RemoteTransportException[[Node4][inet[/IP:9300]][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsea$
[2016-04-13 12:58:48,230][ERROR][marvel.agent.exporter ] [Node1] create failure (index:[.marvel-2016.04.13] type: [index_stats]): RemoteTransportException[[Node4][inet[/IP:9300]][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsea$
[2016-04-13 12:58:48,230][ERROR][marvel.agent.exporter ] [Node1] create failure (index:[.marvel-2016.04.13] type: [index_stats]): RemoteTransportException[[Node4][inet[/IP:9300]][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsea$
[2016-04-13 12:58:48,230][ERROR][marvel.agent.exporter ] [Node1] create failure (index:[.marvel-2016.04.13] type: [index_stats]): RemoteTransportException[[Node4][inet[/IP:9300]][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected execution (queue capacity 50) on org.elasticsea$

skearns · April 13, 2016, 3:26pm

Hi Volodymyr,

It looks like your cluster is overloaded. Those exceptions are saying that your bulk indexing queues are full and the cluster cannot process the indexing requests from Marvel. Given that Marvel has a consistent indexing rate (every 10 seconds by default), I would imagine that there are other things going on here that are causing the issues.

Do you perform bulk indexing actions in other processes?

Can you confirm which version of ES, and which version of Marvel you are running?

Thanks,
Steve

pickypg · April 13, 2016, 3:27pm

Hi,

So you're running ES 1.4.5, but which version of Marvel?

The errors that you're seeing relate to bulk ingestion being rejected -- it means that the bulk queue in your nodes is backed up when Marvel tries to send more stats (index_stats in this case). Specifically, it looks like Node4 is the one backed up.

This means that that node is too busy to handle any additional bulk ingestion. Take a look at that node to see why it's backing up.

Hope that helps,
Chris

Volodymyr_Bilyachat · April 14, 2016, 7:16am

I dont understand versions in Marvel so i suppose its 1.3 because i cant find any version.

As i see this is issue with ES, so next question is is this there any way to kind of skip bulk if there is too much work? because for me problem is that cluster dies and to wake it up i need to restart it.

skearns · April 14, 2016, 12:04pm

Hi Volodymyr,

Given that this looks like a production cluster, I recommend that you setup a separate Monitoring cluster, and have Marvel send it's data to the monitoring cluster, so your monitoring data is on a different cluster than the cluster being monitored. This should reduce the load on your production cluster, and will make it easier to troubleshoot any issues with your production cluster.

To do this, you need to configure the Marvel Exporter to point at a different cluster. If you don't configure it (the default), it will store monitoring data in the local cluster.
https://www.elastic.co/guide/en/marvel/marvel-1.3/stats-export.html#stats-export

Thanks,
Steve

Volodymyr_Bilyachat · April 15, 2016, 11:13am

Thank you i will start from doing that, also i have some logging to production cluster so i will switch that to preproduction

Topic		Replies	Views
Marvel index creation fails and brings down the cluster Elasticsearch	1	849	July 5, 2017
Marvel agent and index.mapper.dynamic: false Elasticsearch elastic-stack-monitoring	6	2587	July 6, 2017
Marvel's not creating .marvel-es-data Elasticsearch elastic-stack-monitoring	4	1497	July 6, 2017
Marvel 2 indexes created on local node instead of marvel nodes Elasticsearch elastic-stack-monitoring	1	1159	July 6, 2017
Marvel plugin incompatible with action.auto_create_index: false and index.mapper.dynamic: false configuration settings Elasticsearch	3	626	July 6, 2017

Marvel.agent.exprorer create failure kills cluster

Related topics