How to delete UNASSIGNED .watches?

Hi,

The status of my cluster is RED. It seems it will be green if I delete 2 UNASSIGNED ".watches" from the following log. How can I fix this? Thanks.

Blockquote
$ curl -XGET 'http://xxx:9200/_cat/shards'
.watcher-history-3-2017.09.15 0 r STARTED 7200 5.3mb 10.99.30.38 node1
.monitoring-es-6-2017.11.29 0 r STARTED 741817 522.9mb 10.99.30.34 node2
.monitoring-es-6-2017.11.29 0 p STARTED 741817 520.7mb 10.99.23.41 node2
.watches 0 p UNASSIGNED
.watches 0 r UNASSIGNED
.watcher-history-3-2017.10.06 0 p STARTED 6537 5.6mb 10.99.30.34 nod3

Blockquote
$ curl -XGET 'xxx:9200/_cluster/health?pretty'
{
"cluster_name" : "cluster",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 6,
"active_primary_shards" : 72,
"active_shards" : 139,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 2,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 1,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 98.58156028368793
}

Blockquote
$ curl -XGET 'http://xxx:9200/_cluster/allocation/explain?pretty'
{
"index" : ".watches",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2017-11-28T14:31:53.291Z",
"last_allocation_status" : "no_valid_shard_copy"
},
"can_allocate" : "no_valid_shard_copy",
"allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
"node_allocation_decisions" : [
{
"node_id" : "xxxplqMrRjWS78Xxxxxxxx",
"node_name" : "node1",
"transport_address" : "xxx:9300",
"node_decision" : "no",
"store" : {
"in_sync" : true,
"allocation_id" : "xxxxxleuSGaMc4m2GVxxxx",
"store_exception" : {
"type" : "corrupt_index_exception",
"reason" : "failed engine (reason: [refresh failed]) (resource=preexisting_corruption)",
"caused_by" : {
"type" : "i_o_exception",
"reason" : "failed engine (reason: [refresh failed])",
"caused_by" : {
"type" : "corrupt_index_exception",
"reason" : "compound sub-files must have a valid codec header and footer: file is too small (0 bytes) (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/path/to/data/nodes/0/indices/xxxx61AmRUW2rwkR8Uxxxx/0/index/_1d31.nvd")))"
}
}
}
}
},
{
"node_id" : "xxxxElcURkqsQYG_hKxxxx",
"node_name" : "node2",
"transport_address" : "xxx:9300",
"node_decision" : "no",
"store" : {
"found" : false
}
},

Hi,
Before deleting or moving the corrupt shards out of the way, can you cat or strings on them to check the content.
Did you recently face any disk or power issues?

This is the cat result. This node already exclude from cluster by re-routing. And the data seems very old to me. It can be deleted.
I dont't know the reason. Maybe because of power issue.

Blockquote
curl -XGET 'http://xxx:9200/_cat/indices/.watches?pretty'
red open .watches 0aRC61AmRUW2rwkR8xxxxx 1 1

the reason seems to indicate that a file has a size of zero, even though it should have contents. This indicates an issue with the files on disk (this can be an clearing of the file by an admin, a corrupt disk or a bug in lucene, an unfinished recovery that was aborted before the other node completely vanished).

if you have more indications what happened due to having more logs, I'd be happy to take a look.

Can you explain what you mean with 'This node already exclude from cluster by re-routing' - has this node split away from the cluster and still is a master? If thats the case, then you should check your cluster configuration, especially with regards to the minimum master nodes setting.

Apart from that, if you dont need that index, you can safely delete it. I suppose you are not using watcher, but only monitoring - monitoring in turn creates watches - and they get recreated over time.

Hope this helps!

Hi spinscale,

About 'This node already exclude from cluster by re-routing', I mean I don't want this node by removing it from cluster just use following command:

Blockquote
curl -XPUT 'http://xxx:9200/_cluster/settings?pretty' -d '
{
"transient" : {
"cluster.routing.allocation.exclude._name" : "this node"
}
}'

You said I could safely delete it. May I ask how? I used this command and it has error:

Blockquote
curl -XDELETE 'http://xxx:9200/.watches/?pretty'
{"error":"This endpoint is not supported for DELETE on .watches index.","status":400}%

Hi @tjliu,

I think you try this query in kibana Dev Tool your cluster is come in green and unassigned shards will be removed.

PUT /_settings
{
"index": {
"number_of_replicas" : 0
}
}

Hi @Krunal_kalaria

I tried your command in kibana Dev Tool, but I got timeout message. Maybe because of I'm still doing big bulk indexing.

Blockquote
{
"statusCode": 504,
"error": "Gateway Timeout",
"message": "Client request timeout"
}

But I tried this command to set number_of_replicas to 0 of index ".watches". Glad it worked: one UNASSIGNED ".watches" were removed. I still got one left. How to remove this last one?

Blockquote
curl -XPUT 'http://xxx:9200/.watches/_settings/?pretty' -d '
{
"index": {
"number_of_replicas" : 0
}
}'

Blockquote
curl -XGET 'xxx:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason' | grep UNASSIGNED
.watches 0 p UNASSIGNED CLUSTER_RECOVERED

Blockquote
curl -XDELETE 'http://xxx:9200/.watches/?pretty'
{"error":"This endpoint is not supported for DELETE on .watches index.","status":400}%

try to delete by running curl -X DELETE http://xxx:9200/.watches*

4 Likes

hi @spinscale

Thanks a lot. It worked. My cluster is finally green. :grinning:

@spinscale good solution done by you. :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.