Blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]

Hi...
I have a Elastic with 8 nodes (2 client, 3 master, 3 data) and when i ask the healt the system showme:

[root@log-elasticsearch-client-01 elasticsearch]# curl -X GET 'http://log-elasticsearch-client-01.alpha.ci.ucr.ac.cr:9200/_cluster/health?pretty'
{
"cluster_name" : "log-elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 3,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : "NaN"
}
[root@log-elasticsearch-client-01 elasticsearch]#

The elastic log show this:

[2019-03-27T09:48:42,465][WARN ][o.e.m.j.JvmGcMonitorService] [log-elasticsearch-client-01] [gc][80200] overhead, spent [2.1s] collecting in the last [2.1s]
[2019-03-27T10:59:55,608][WARN ][r.suppressed ] [log-elasticsearch-client-01] path: /.reporting-/esqueue/_search, params: {index=.reporting-, type=esqueue, version=true}
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:166) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:152) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:297) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.action.search.TransportSearchAction.lambda$doExecute$4(TransportSearchAction.java:193) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:114) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:87) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:215) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:68) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:124) ~[?:?]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:165) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:87) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:76) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:537) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.rest.action.search.RestSearchAction.lambda$prepareRequest$2(RestSearchAction.java:100) ~[elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:97) [elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.xpack.security.rest.SecurityRestFilter.handleRequest(SecurityRestFilter.java:72) [x-pack-security-6.6.1.jar:6.6.1]
at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:240) [elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.rest.RestController.tryAllHandlers(RestController.java:336) [elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:174) [elasticsearch-6.6.1.jar:6.6.1]
at org.elasticsearch.http.netty4.Netty4HttpServerTransport.dispatchRequest(Netty4HttpServerTransport.java:551) [transport-netty4-client-6.6.1.jar:6.6.1]
at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:137) [transport-netty4-client-6.6.1.jar:6.6.1]
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at org.elasticsearch.http.netty4.pipelining.HttpPipeliningHandler.channelRead(HttpPipeliningHandler.java:68) [transport-netty4-client-6.6.1.jar:6.6.1]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at org.elasticsearch.http.netty4.cors.Netty4CorsHandler.channelRead(Netty4CorsHandler.java:86) [transport-netty4-client-6.6.1.jar:6.6.1]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]

Hi @sblancocr,

it looks like you lost all shard data for your cluster. Did you maybe repurpose some of the nodes recently? I think this could happen if you originally had all nodes as all roles and then changed configuration as per your description above.

Notice that cluster health status is red, meaning you have indices that miss data. Also notice that active_primary_shards and active_shards is 0, which means that no indices are allocated.

thank..

I'm testing the tool.

That's why I do not know much about her ...

How can I delete the data to lift the service?

Hi @sblancocr,

if you really want to wipe all data in the entire cluster, you can stop all nodes and then delete all your data folders (by default these reside in the installation dir, you may have configured them to reside elsewhere using the path.data setting). Be careful, this will delete all data, essentially reverting to a new installation.

Alternatively, simply setting node.data=true on all nodes should recover the missing data. Provided all your indices have at least one replica, you should then be able to repurpose one node at a time, waiting for green cluster health after every node has been repurposed (thereby giving Elasticsearch time to reestablish having two copies of every shard).

Thank you
I'm going to set node.data = true
to try
and see how recovery can be done.

Hello
I did the deletion of all the data and I restarted all the nodes, nevertheless it continues presenting as a state in "red"
I made the following query to see the status of the cluster and I detail it below.

 curl -X GET 'http://log-elasticsearch-client-01:9200/_cluster/stats?human&pretty'
{
  "_nodes" : {
    "total" : 8,
    "successful" : 8,
    "failed" : 0
  },
  "cluster_name" : "log-elasticsearch",
  "cluster_uuid" : "G2YlYLizQYyiwC-Rpc0KYg",
  "timestamp" : 1554844834402,
  "status" : "red",
  "indices" : {
    "count" : 0,
    "shards" : { },
    "docs" : {
      "count" : 0,
      "deleted" : 0
    },
    "store" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "0b",
      "memory_size_in_bytes" : 0,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "0b",
      "memory_size_in_bytes" : 0,
      "total_count" : 0,
      "hit_count" : 0,
      "miss_count" : 0,
      "cache_size" : 0,
      "cache_count" : 0,
      "evictions" : 0
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 0,
      "memory" : "0b",
      "memory_in_bytes" : 0,
      "terms_memory" : "0b",
      "terms_memory_in_bytes" : 0,
      "stored_fields_memory" : "0b",
      "stored_fields_memory_in_bytes" : 0,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "0b",
      "norms_memory_in_bytes" : 0,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "0b",
      "doc_values_memory_in_bytes" : 0,
      "index_writer_memory" : "0b",
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory" : "0b",
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set" : "0b",
      "fixed_bit_set_memory_in_bytes" : 0,
      "max_unsafe_auto_id_timestamp" : -9223372036854775808,
      "file_sizes" : { }
    }
  },
  "nodes" : {
    "count" : {
      "total" : 8,
      "data" : 3,
      "coordinating_only" : 2,
      "master" : 3,
      "ingest" : 0
    },
    "versions" : [
      "6.7.1"
    ],
    "os" : {
      "available_processors" : 16,
      "allocated_processors" : 16,
      "names" : [
        {
          "name" : "Linux",
          "count" : 8
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "CentOS Linux 7 (Core)",
          "count" : 8
        }
      ],
      "mem" : {
        "total" : "29.5gb",
        "total_in_bytes" : 31770730496,
        "free" : "1.9gb",
        "free_in_bytes" : 2128527360,
        "used" : "27.6gb",
        "used_in_bytes" : 29642203136,
        "free_percent" : 7,
        "used_percent" : 93
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 0
      },
      "open_file_descriptors" : {
        "min" : 359,
        "max" : 371,
        "avg" : 365
      }
    },
    "jvm" : {
      "max_uptime" : "30.1m",
      "max_uptime_in_millis" : 1807670,
      "versions" : [
        {
          "version" : "1.8.0_131",
          "vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
          "vm_version" : "25.131-b11",
          "vm_vendor" : "Oracle Corporation",
          "count" : 8
        }
      ],
      "mem" : {
        "heap_used" : "1.4gb",
        "heap_used_in_bytes" : 1568928224,
        "heap_max" : "23.8gb",
        "heap_max_in_bytes" : 25630343168
      },
      "threads" : 239
    },
    "fs" : {
      "total" : "107.9gb",
      "total_in_bytes" : 115880230912,
      "free" : "88gb",
      "free_in_bytes" : 94576070656,
      "available" : "88gb",
      "available_in_bytes" : 94576070656
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 8
      },
      "http_types" : {
        "security4" : 8
      }
    }
  }
}

If I do the index query, it shows me

curl -sS -XGET "http://log-elasticsearch-client-01.alpha.ci.ucr.ac.cr:9200/_cat/indices?"
red open metricbeat-2019.03.25 JffV9ODwRY-akZTsEb0c0Q 5 1    
red open .kibana_1             FjS7uuL9RpaebBYYh1m9gg 1 1    
red open metricbeat-2019.03.21 2ADyV09qTg6fWRZsHe_kLA 5 1    
red open metricbeat-2019.03.05 2GeJJBXkSBWXP7wrEH-CHA 5 1

How did you delete the data? Via elasticsearch / curl -X DELETE ... commands, or just rm on the data node instances ?

btw for learner, maybe better to start with a single-node elasticsearch, and kibana, instances on a laptop and work through some basic cases using sample data.

Hi @sblancocr,

let us try to find out why those indices are not allocated then. You can use the allocation explain API for that, see: https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-allocation-explain.html

Pick one of the indices and use the explain API to figure out why the primary or replica cannot be allocated. Could be a variety of reasons, like disk space, allocation filters etc.

Also, to double check that all data folders were actually deleted, you can do:

curl localhost:9200/_cat/indices?h=h,s,i,id,p,r,dc,dd,ss,creation.date.string

Your creation dates should then show something after you cleaned out everything.

Maybe I need to clarify too that in order to wipe the cluster completely and start over, you have to shut down all 8 nodes in the cluster and then delete all the data folders (as pointed to by path.data) on all 8 nodes before starting them again. Be warned that this will permanently delete all data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.