Error Watcher output for pager duty

Hello Community,

I have configured pagerduty watcher couple of weeks back and it successfully posted alerts to my company pagerduty portal when I excecute the watcher using simulation. So I thought everything seems working fine but later I realized it worked only when executing from simulation and getting execution failing error when watcher triggers for every 15min (even condition met) with PKIX cert issue.

Between now I am facing a strange issue, I have pagerduty URL in my watcher "mycompnay.pagerduty.com" and I am trying to execute using simulation tab and I am getting Internal server error but is confusing me is In watcher error logs in backend it's showing as failed to connect events.pagerduty.com but no where I gave that URL in my watcher.

My question is how it is possible and why it is contacting without having events.pagerduty.com in my watcher config. Moreover, it worked previously.

Below is my email config for pagerduty and error logs.

   "actions": {
    "notify-pagerduty": {
      "throttle_period_in_millis": 300000,
      "pagerduty": {
        "description": "ATTENTION: critical logs found in last 1hour and triggered at {{ctx.execution_time}}",
        "client_url": "https://kibana.corp.mycompany.com/",
        "attach_payload": true,
        "account": "team1",
        "contexts": [
          {
            "type": "link",
            "href": "https://mycompany.pagerduty.com"
          },
          {
            "type": "link",
            "href": "https://mycompany.pagerduty.com",
            "text": "View the incident on {{ctx.payload.link}}"
          }
        ]
      }
    }
  }

Simulation (execute) output error from the backend logs

**elasticsearch_cordinator** | org.apache.http.conn.ConnectTimeoutException: Connect to events.pagerduty.com:443 [events.pagerduty.com/13.56.226.208, events.pagerduty.com/18.144.19.36, events.pagerduty.com/54.203.47.92] failed: connect timed out

**elasticsearch_cordinator** | at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:150) ~[httpclient-4.5.2.jar:4.5.2]

**elasticsearch_cordinator** | at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) ~[httpclient-4.5.2.jar:4.5.2]

**elasticsearch_cordinator** | at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) ~[httpclient-4.5.2.jar:4.5.2]

**elasticsearch_cordinator** | at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) ~[httpclient-4.5.2.jar:4.5.2]

**elasticsearch_cordinator** | at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[httpclient-4.5.2.jar:4.5.2]

**elasticsearch_cordinator** | at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) ~[httpclient-4.5.2.jar:4.5.2]

**elasticsearch_cordinator** | at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[httpclient-4.5.2.jar:4.5.2]

**elasticsearch_cordinator** | at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[httpclient-4.5.2.jar:4.5.2]

**elasticsearch_cordinator** | at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[httpclient-4.5.2.jar:4.5.2]

**elasticsearch_cordinator** | at org.elasticsearch.xpack.watcher.common.http.HttpClient.lambda$execute$0(HttpClient.java:173) ~[?:?]

**elasticsearch_cordinator** | at java.security.AccessController.doPrivileged(Native Method) ~[?:?]

**elasticsearch_cordinator** | at org.elasticsearch.xpack.core.common.socket.SocketAccess.doPrivileged(SocketAccess.java:32) ~[x-pack-core-6.5.0.jar:6.5.0]

**elasticsearch_cordinator** | at org.elasticsearch.xpack.watcher.common.http.HttpClient.execute(HttpClient.java:173) ~[?:?]

**elasticsearch_cordinator** | at org.elasticsearch.xpack.watcher.notification.pagerduty.PagerDutyAccount.send(PagerDutyAccount.java:53) ~[?:?]

**elasticsearch_cordinator** | at org.elasticsearch.xpack.watcher.actions.pagerduty.ExecutablePagerDutyAction.execute(ExecutablePagerDutyAction.java:50) ~[?:?]

**elasticsearch_cordinator** | at org.elasticsearch.xpack.core.watcher.actions.ActionWrapper.execute(ActionWrapper.java:144) [x-pack-core-6.5.0.jar:6.5.0]

**elasticsearch_cordinator** | at org.elasticsearch.xpack.watcher.execution.ExecutionService.executeInner(ExecutionService.java:455) [x-pack-watcher-6.5.0.jar:6.5.0]

**elasticsearch_cordinator** | at org.elasticsearch.xpack.watcher.execution.ExecutionService.execute(ExecutionService.java:295) [x-pack-watcher-6.5.0.jar:6.5.0]

**elasticsearch_cordinator** | at org.elasticsearch.xpack.watcher.transport.actions.execute.TransportExecuteWatchAction$1.doRun(TransportExecuteWatchAction.java:154) [x-pack-watcher-6.5.0.jar:6.5.0]

**elasticsearch_cordinator** | at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.5.0.jar:6.5.0]

**elasticsearch_cordinator** | at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]

**elasticsearch_cordinator** | at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]

**elasticsearch_cordinator** | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.5.0.jar:6.5.0]

**elasticsearch_cordinator** | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]

**elasticsearch_cordinator** | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]

**elasticsearch_cordinator** | at java.lang.Thread.run(Thread.java:834) [?:?]

**elasticsearch_cordinator** | Caused by: java.net.SocketTimeoutException: connect timed out

**elasticsearch_cordinator** | at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?]

**elasticsearch_cordinator** | at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) ~[?:?]

**elasticsearch_cordinator** | at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) ~[?:?]

**elasticsearch_cordinator** | at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) ~[?:?]

**elasticsearch_cordinator** | at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403) ~[?:?]

**elasticsearch_cordinator** | at java.net.Socket.connect(Socket.java:591) ~[?:?]

**elasticsearch_cordinator** | at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:337) ~[httpclient-4.5.2.jar:4.5.2]

**elasticsearch_cordinator** | ... 25 more

Hey,

the pagerduty action always sends to events.pagerduty.com. It just seems that this is not possible anymore - either the connection has been firewalled, or maybe DNS entries are stale (the JVM running Elasticsearch caches DNS entries, unless you are running 6.6.0 or above).

Can you access events.pagerduty.com from the elasticsearch nodes, when you sshing into the boxes and use curl? If so, restarting the process may be worth a try if you are running an older version than 6.6.0.

--Alex

It seems reboot helped me here but still watcher giving false results, I configured watcher to notify if level name matches CRITICAL butI simulate or even with triggered alerts it's reporting INFO logs.

Here is my Watcher results and watcher input

{
  "trigger": {
    "schedule": {
      "interval": "m"
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "dslogs-*"
        ],
        "types": [],
        "body": {
          "query": {
            "bool": {
              "must": [
                {
                  "term": {
                    "beat.ip": "10.7.1.109"
                  }
                }
              ],
              "filter": [
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-60m"
                    }
                  }
                }
              ],
              "should": [
                {
                  "match_phrase": {
                    "levelname": "CRITICAL"
                  }
                }
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 0
      }
    }
  },
  "actions": {
    "notify-pagerduty": {
      "throttle_period_in_millis": 300000,
      "pagerduty": {
        "description": "ATTENTION: critical logs found in last 1hour and triggered at {{ctx.execution_time}}",
        "client_url": "https://kibana1.corp.mycompany.com/",
        "attach_payload": true,
        "account": "team1",
        "contexts": [
          {
            "type": "link",
            "href": "https://mycompany.pagerduty.com"
          },
          {
            "type": "link",
            "href": "https://mycompany.pagerduty.com",
            "text": "View the incident on {{ctx.payload.link}}"
          }
        ]
      }
    }
  }
}

Simulation Results

    {
  "watch_id": "_inlined_",
  "node": "Bd-EWXLQTbKMG-oCTDvGhQ",
  "state": "executed",
  "user": "user1",
  "status": {
    "state": {
      "active": true,
      "timestamp": "2019-03-20T15:08:21.934Z"
    },
    "last_checked": "2019-03-20T15:08:21.934Z",
    "last_met_condition": "2019-03-20T15:08:21.934Z",
    "actions": {
      "notify-pagerduty": {
        "ack": {
          "timestamp": "2019-03-20T15:08:21.934Z",
          "state": "ackable"
        },
        "last_execution": {
          "timestamp": "2019-03-20T15:08:21.934Z",
          "successful": true
        },
        "last_successful_execution": {
          "timestamp": "2019-03-20T15:08:21.934Z",
          "successful": true
        }
      }
    },
    "execution_state": "executed",
    "version": -1
  },
  "trigger_event": {
    "type": "manual",
    "triggered_time": "2019-03-20T15:08:21.934Z",
    "manual": {
      "schedule": {
        "scheduled_time": "2019-03-20T15:08:21.934Z"
      }
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "dslogs-*"
        ],
        "types": [],
        "body": {
          "query": {
            "bool": {
              "must": [
                {
                  "term": {
                    "beat.ip": "10.7.1.109"
                  }
                }
              ],
              "filter": [
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-60m"
                    }
                  }
                }
              ],
              "should": [
                {
                  "match_phrase": {
                    "levelname": "CRITICAL"
                  }
                }
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {
      "ctx.payload.hits.total": {
        "gt": 0
      }
    }
  },
  "metadata": {
    "xpack": {
      "type": "json"
    }
  },
  "result": {
    "execution_time": "2019-03-20T15:08:21.934Z",
    "execution_duration": 466,
    "input": {
      "type": "search",
      "status": "success",
      "payload": {
        "_shards": {
          "total": 6,
          "failed": 0,
          "successful": 6,
          "skipped": 0
        },
        "hits": {
          "hits": [
            {
              "_index": "dslogs-2019.03",
              "_type": "doc",
              "_source": {
                "msg": "Processing data into output file.",
                "log": "",
                "relativeCreated": 44662.48106956482,
                "source": "/opt/mycompany/logs/estatement-email-notification.log",
                "funcName": "<module>",
                "processName": "MainProcess",
                "host": "mgmt-tsys",
                "@version": 1,
                "beat": {
                  "hostname": "mgmt-tsys",
                  "ip": "10.7.1.109",
                  "name": "mgmt-tsys",
                  "version": "6.5.4"
                },
                "msecs": 411.47303581237793,
                "levelname": "INFO",
                "process": 20589,
                "source_host": "mgmt-tsys",
                "offset": 508561,
                "created": 1553091346.411473,
                "module": "push-estatement-notification",
                "prospector": {
                  "type": "log"
                },
                "thread": 140710608230208,
                "levelno": 20,
                "threadName": "MainThread",
                "pathname": "push-estatement-notification.py",
                "tags": [
                  "beats_input_raw_event",
                  "tcde-logs"
                ],
                "args": [],
                "input": {
                  "type": "log"
                },
                "lineno": 64,
                "filename": "push-estatement-notification.py",
                "@timestamp": "2019-03-20T14:15:46.411Z",
                "name": "estatement-email-notification"
              },
              "_id": "KAJ1m2kBUVaYE_yqa0oT",
              "_score": 0.19269441
            },
          ],
          "total": 1,
          "max_score": 0.19269441
        },
        "took": 26,
        "timed_out": false
      },
      "search": {
        "request": {
          "search_type": "query_then_fetch",
          "indices": [
            "dslogs-*"
          ],
          "types": [],
          "body": {
            "query": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "beat.ip": "10.7.1.109"
                    }
                  }
                ],
                "filter": [
                  {
                    "range": {
                      "@timestamp": {
                        "gte": "now-60m"
                      }
                    }
                  }
                ],
                "should": [
                  {
                    "match_phrase": {
                      "levelname": "CRITICAL"
                    }
                  }
                ]
              }
            }
          }
        }
      }
    },
    "condition": {
      "type": "compare",
      "status": "success",
      "met": true,
      "compare": {
        "resolved_values": {
          "ctx.payload.hits.total": 1
        }
      }
    },

Action-> pagerduty (since it's more than 7000 characters I am not including same content here :slight_smile:
}

Most surprising and funny part is I am getting pagerduty alerts even after I disable the watcher for those logs.

just use a gist next time instead of pasting in here.

The problem with the watch is within your query. You are using a should clause to filter for the level_name. This means it will only scored higher, but is not required to be in that document. You need to move this into the must clause as well.

Which Elasticsearch version are you running on?

Ok, I changed to must. I am using version 6.5.0

how did you disable the watch? Also you can always query the watch history indices for most recent runs of a watch, this way you can see what happened with each execution. See https://www.elastic.co/guide/en/elastic-stack-overview/6.6/watcher-getting-started.html

hope this helps!

In watcher there is option to deactivate watcher, I disabled using that. I tried using must but didn't worked where as I kept Should but added minimum_should_match:1 and it worked on simulation.

can you put the full output of the get watch API into a gist?

@spinscale I'm guessing this is not possible, but is there a way to remove the cached DNS entries without doing a rolling restart of the cluster? I was thinking of something similar to flushing the DNS entries on Linux or Windows.

Thanks.

FYI, we ran into this exact issue today. Our watchers were no longer able to connect to events.pagerduty.com. We're using Elastic Stack 6.6.1. A rolling restart fixed the issue with cached DNS entries .

Here it is, this is working one.

@mikemadden42 this should not happen with Elasticsearch 6.6 anymore, see https://github.com/elastic/elasticsearch/pull/36570

Just to be sure: Have you changed anything from the default configuration? If not, it might be worth opening an issue in github.

Hi @spinscale, thanks for the reference to the pull request. Based on the commits, I'm guessing that 6.6 versions of elasticsearch, we should have the following lines in our jvm.options?

-Des.networkaddress.cache.ttl=60

I'm not seeing that option in our jvm.options file.

$ grep networkaddress /etc/elasticsearch/jvm.options
$

Yes, that is the option I referred to. Is it possible that you changed that file at some point manually and you have been installing Elasticsearch via package repositories? This might be the reason why the file did not get replaced.

Just go ahead and add it, and everything should be fine.

It seems issue happening again and I am 6.5. is that ok to add -Des.networkaddress.cache.ttl=60 in JVM options or any work around for 6.5 apart from keep rolling restart the nodes?

this option will not be applied in Elasticsearch 6.5, you need to run on 6.6 at least or restart the node.

--Alex

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.