Metricbeat Index turns red every hours

I run a 2 Node elasticsearch 5.5.1 cluster that holds a metricbeat index. Every about three hour the index from metricbeat turns red.

The cluster status:

{
  "cluster_name": "elasticsearch",
  "status": "red",
  "timed_out": false,
  "number_of_nodes": 2,
  "number_of_data_nodes": 2,
  "active_primary_shards": 16,
  "active_shards": 31,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 2,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 93.93939393939394
}

The shards are uassigned:

metricbeat-2017.08.31           0 p UNASSIGNED                        
metricbeat-2017.08.31           0 r UNASSIGNED    

i created a template with one shard and one replica.

The same happens to my other indices but much later, there is not that traffic.

How can i fix this? What information do also need to help me?

2 nodes is bad, you have no majority there for a consensus.

What do the logs show when this state change occurs?

Visualize: Request to Elasticsearch failed: {"error":{"root_cause":[],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[]},"status":503}

in the kibana ui

What about in Elasticsearch?

elastic-logstash-1.1.9q2k1638sy7l@ucore02.solutions.test    | [2017-08-31T23:26:27,284][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[metricbeat-2017.08.31][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[metricbeat-2017.08.31][0]] containing [94] requests]"})
elastic-logstash-1.1.9q2k1638sy7l@ucore02.solutions.test    | [2017-08-31T23:26:27,284][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[metricbeat-2017.08.31][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[metricbeat-2017.08.31][0]] containing [94] requests]"})

The shard is not active

elastic-logstash-1.1.y03yuuhpi2fp@ucore01.solutions.test    | [2017-09-01T08:26:08,287][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[metricbeat-2017.09.01][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[metricbeat-2017.09.01][0]] containing [125] requests]"})
elastic-logstash-1.1.y03yuuhpi2fp@ucore01.solutions.test    | [2017-09-01T08:26:08,287][INFO ][logstash.outputs.elasticsearch] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>125}

Maybe a usefull information: it runs in docker.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.