Watch is stuck at error, always needs a manual restart

Hi, Alex. Thanks for replying.

This watcher works well when I click on 'Send Request' button, while creating a test watch, and check its output. But only when it is left to execute on its own it throws: connect_timeout_exception.

This is the output of GET _watcher/stats?emit_stacktraces=true:

{
  "_nodes" : {
    "total" : 8,
    "successful" : 8,
    "failed" : 0
  },
  "cluster_name" : "abc",
  "manually_stopped" : false,
  "stats" : [
    {
      "node_id" : "lyELuRG0QAS64C93O-oqRg",
      "watcher_state" : "started",
      "watch_count" : 0,
      "execution_thread_pool" : {
        "queue_size" : 0,
        "max_size" : 1
      }
    },
    {
      "node_id" : "eEJyFdOwT3-MCKTGVx3p4w",
      "watcher_state" : "started",
      "watch_count" : 2,
      "execution_thread_pool" : {
        "queue_size" : 0,
        "max_size" : 40
      }
    },
    {
      "node_id" : "O3ezUXu2Q8qUq62scF11_g",
      "watcher_state" : "started",
      "watch_count" : 0,
      "execution_thread_pool" : {
        "queue_size" : 0,
        "max_size" : 0
      }
    },
    {
      "node_id" : "y6wGcIPQRHmhePoYtxI01Q",
      "watcher_state" : "started",
      "watch_count" : 0,
      "execution_thread_pool" : {
        "queue_size" : 0,
        "max_size" : 0
      }
    },
    {
      "node_id" : "57shI5qnS86BfgMSKaSsYg",
      "watcher_state" : "started",
      "watch_count" : 0,
      "execution_thread_pool" : {
        "queue_size" : 0,
        "max_size" : 40
      }
    },
    {
      "node_id" : "qR_AGOwJRNuBxsqPOWuRrA",
      "watcher_state" : "started",
      "watch_count" : 8,
      "execution_thread_pool" : {
        "queue_size" : 0,
        "max_size" : 40
      }
    },
    {
      "node_id" : "u9zQ0k-gS72zDbftee7OIQ",
      "watcher_state" : "started",
      "watch_count" : 0,
      "execution_thread_pool" : {
        "queue_size" : 0,
        "max_size" : 0
      }
    },
    {
      "node_id" : "wAP1CsuERLOQdECfH4y3ew",
      "watcher_state" : "started",
      "watch_count" : 0,
      "execution_thread_pool" : {
        "queue_size" : 0,
        "max_size" : 0
      }
    }
  ]
}

The watcher is working on qR_AGOwJRNuBxsqPOWuRrA node and when i checked node/stats I saw the following output:

"failures" : [
  {
    "type" : "failed_node_exception",
    "reason" : "Failed node [qR_AGOwJRNuBxsqPOWuRrA]",
    "node_id" : "qR_AGOwJRNuBxsqPOWuRrA",
    "caused_by" : {
      "type" : "translog_exception",
      "reason" : "Unable to get the earliest last modified time for the transaction log",
      "index_uuid" : "Gd5TSGq7Qoaut5hrpNocJw",
      "shard" : "0",
      "index" : ".kibana_1"
    }
  }
]

}

Could that be the reason? .kibana_1 index has 2 shards and out of them 1 has missing translog files. How do I fix that?

Thanks.