Elasticsearch change state

From the information you've sent:

  • the cluster is not receiving a lot of indexing requests, so the instability seems not triggered by the indexing activity
  • field data usage is ok, so the heap consumption doesn't come from global ordinals
  • you have too many shards per node: this means a lot of JVM heap is being used just to track the segments/shards of the cluster
    • too many shards might lead to too many open descriptors: do you have the full stack trace following java.nio.channels.ClosedChannelException?
    • try to force merge old indices you know are read only (you will no more write or them), or delete old data on the cluster
  • The cluster is green now but garbage collection is making it instable as some nodes become so busy that they do not reply to pings for more than 15s

With GET _cat/shards?v&b=b we can see the number of shards, the average size and the number of shards per index to confirm that.