Watcher not starting on hosted cluster

Hello all,

I have a cluster hosted on elastic cloud and I can index documents to the cluster, but when I try to create or delete watches, I receive the following error

{
   "error": "ElasticsearchIllegalStateException[not started]",
   "status": 500
}

Because this is a hosted cluster, I don't have logs or any sort of useful information. /_cat/indices shows the watcher indexes and /_cat/plugins shows the watcher plugin is installed. What else can I do?

Hey,

Out of curiosity, I assume this an hosted cluster by Elastic Cloud?

Can you manually start watcher using the start API and paste what is being returned?

--Alex

Yes, sorry. Hosted by Elastic Cloud.

Both _watcher/_start and _watcher/_restart return

{
   "acknowledged": true
}

But my attempts to post a watch still return

{
    "error": "RemoteTransportException[[tiebreaker-0000000023][inet[/REDACTED-IP]]  [cluster:admin/watcher/watch/put]]; nested: ElasticsearchIllegalStateException[not started]; ",
    "status": 500
}

Hey,

interesting. Let's try and debug this further.

  1. You do have access to the logs in cloud by checking the Logs tab in cloud. Can you search for watcher and or maybe just paste all the entries that occur when you try to start it?
  2. Is it possible, that you lost some shards of the watcher related indices? Can you run
GET _cat/shards/.w*
GET _cat/shards/.t*

and show the results?

--Alex

1 Like

I can't believe I missed the logs tab :sob:

[2017-01-05 16:45:49,351][WARN ][watcher ] [tiebreaker-0000000023] failed to start watcher. please wait for the cluster to become ready or try to start Watcher manually org.elasticsearch.index.engine.DocumentAlreadyExistsException: [.watch_history-2017.01.04][0] [watch_record][danger-room-ab9c3479-5926-4686-8375-64d8b5075780_12-2017-01-04T00:00:00.096Z]: document already exists at org.elasticsearch.index.engine.InternalEngine.innerCreateNoLock(InternalEngine.java:329) at org.elasticsearch.index.engine.InternalEngine.innerCreate(InternalEngine.java:287) at org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:259) at org.elasticsearch.index.shard.IndexShard.create(IndexShard.java:482) at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:206) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

I deleted the .watch_history* indicies and have attempted to restart it again but it looks like it hung with nothing in the logs other than "[INFO ][watcher ] starting watch service..."

1 Like

Even after a full cluster restart, nothing is happening in the logs. All of my shards are in a STARTED state as well.

Looks like it just took a long time to start. Thanks for your help

17:22:42	INFO	watcher	[2017-01-05 17:22:42,604][INFO ][watcher ] watch service has started
17:06:37	INFO	watcher	[2017-01-05 17:06:37,444][INFO ][watcher ] starting watch service...

Hey,

wow, thats a lot of time for starting up! I guess it is too late now, but was any one of those indices (especially the .triggered-watches one) containing a lot of documents?

--Alex

I removed all the watches before hand, so unless trigged-watches was full of deleted documents or something that caused it to be slow.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.