Problems in my Cluster

McElroy · May 17, 2017, 7:23am

Hi,

We are experiencing some troubles with our cluster. When we come into the office on monday, one or two of our nodes are gone including the master.
I also get this message in the logs:
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
From what I read here in the forum that could be because I have to many shards, which is highly possible when I look at my clusterhealth.

{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 4,
"active_primary_shards" : 9100,
"active_shards" : 23225,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

We have in all indices approximately 20 million hits.

I really appreciate any approach on improving the stability of my cluster, because getting my clusterhealth back to green is a pain in the neck

kind regards
Andy

Christian_Dahlqvist · May 17, 2017, 7:30am

You have far too many shards for a cluster that size. You need to revise you sharing strategy and bring that down by at least an order of magnitude or so. Aim to have an average shard size between a few GB and a few tens of GB.

McElroy · May 17, 2017, 7:34am

Where can I check the size of a shard?
And which configuration would you recommend for the case I have, if I am allowed to ask?

Christian_Dahlqvist · May 17, 2017, 7:37am

you can check shard and index size through the _cat/indices and _cat/shards APIs. What type of data do you have in the cluster? What is your current sharding strategy? If you are using time-based indices, what is your retention period? Which version of Elasticsearch are you using?

McElroy · May 17, 2017, 7:49am

Thank you.
Ok my biggest index is something around 18Gb... and some of my shards are around 1,5Gb.
We are using it for Apache logfiles, some windows service logs and since a month or so the output of our docker containers.
We create a new index for everyday, but we have 9 indices.
We are running 5.0.1.
I am not quite sure what you meant with sharing strategy, but if it is the shard and replica config, there it is:

{
  "logstash-2017.05.16" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "5s",
        "number_of_shards" : "5",
        "provided_name" : "logstash-2017.05.16",
        "creation_date" : "1494892819101",
        "number_of_replicas" : "1",
        "uuid" : "pPFz1d3EQEe6XY-dlw344w",
        "version" : {
          "created" : "5000199"
        }
      }
    }
  }
}

Christian_Dahlqvist · May 17, 2017, 7:56am

That was supposed to be sharding, not sharing. The biggest index seems OK, but probably do not need 5 primary shards. Adjust the number of primary shards and do not use the default of 5 for very small indices. Also consider consolidating small indices and/or using weekly or even monthly indices instead of daily.

McElroy · May 17, 2017, 7:59am

If I am not completely wrong I can't change the shard size to anything smaller without removing the index?
But first of all thank you for your help. You already helped me a lot.

Christian_Dahlqvist · May 17, 2017, 8:03am

As you are on Elasticsearch 5.x, the shrink index API can help you get from 5 to 1 shard per index. You may also be able to reduce the number of replicas you have configured in order to bring the shard count down. Beyond that, and I think you will need to reduce the shard count further than that, you will need to reindex data. This can take time, but do change the settings for newly created indices so that you generate fewer new shards per day right away.

McElroy · May 17, 2017, 8:11am

Can you give me advice on reindexing as well? I have never done that before.

Christian_Dahlqvist · May 17, 2017, 8:15am

You should be able to use the reindex API to do this.

McElroy · May 17, 2017, 8:20am

McElroy · May 23, 2017, 6:03am

I think this looks a whole lot better.
Thank you again for your help

{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 4,
"active_primary_shards" : 2866,
"active_shards" : 5977,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

system · June 20, 2017, 6:04am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
My health status is red Elasticsearch	6	854	July 5, 2017
2.2.0: put-mapping errors Elasticsearch	1	572	October 19, 2017
Elasticsearch cluster instability Elasticsearch	13	2828	July 6, 2017
ElasticSearch with > 40 nodes, missing shards and indexing troubles Elasticsearch	11	615	July 6, 2017
Shards stuck in relocating Elasticsearch	3	2770	July 5, 2017

Problems in my Cluster

Related topics