Resend after SMTPSendFailedException?

Folks were complaining about a Watcher 2.4 alert which had been successfully logged, so when I went to the history I found it had triggered but the email action failed:
"actions": [
"id": "alert_email",
"type": "email",
"status": "failure",
"reason": "MessagingException[failed to send email with subject [Email Subject] via account [work]]; nested: SMTPSendFailedException[[EOF]]; "

This problem seems uncommon, but I can't find a setting which would attempt resend or send a warning about failed emails. Where can I find deeper information about the error? Is there a way to resend this? Do I need an alert to look for recent failed alert actions? Would updating to 5.x gain me any of that?

I don't believe there's a direct way to "retry" a failed watch, but you can re-execute it with the _execute API, and use either trigger_data or alternative_input to query or fill in the data that will be used in the action.

As for getting warning about failed actions, I'm not sure. I was hoping to find some error handling settings in either the email configuration or perhaps a global error notifier in watcher, but I don't see either in the docs. So short of watching the logs or the watch histories, I'm not sure the best way to get notified.

One possible solution would be to add a watch to the .watcher-history indices, which would watch for failed watches and trigger some other kind of action. It seems kind of hacky, but it should work. There might be a better solution, but I don't see anything in the docs.

@spinscale would be able to say for sure if there's a better way to handle these failures, give him a little time to reply.


@Joe_Fleming is right, there is no special mechanism. It is pretty hard to differentiate between different types of SMTP errors (connection refused, wrong credentials, temporary timeout) and decide when to retrigger based on what is temporary or permanent.

You can either check the master node log file or paste the full watch history entry here, so we can take a look. If both dont yield more information we need to add more information on the logging side.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.