How to increase watcher search timeout?

Hi,

I'v create advanced watcher

the part of input is

  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "phplog-*"
        ],
        "rest_total_hits_as_int": true,
        "body": {
          "size": 0,
          "query": {
            "bool": {
              "filter": [
                {
                  "match": {
                    "type": "response_log"
                  }
                },
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-3d/d",
                      "to": "now"
                    }
                  }
                }
              ]
            }
          },
          "aggs": {
            "group_by_route": {
              "terms": {
                "field": "route_format.keyword",
                "size": 999
              },
              "aggs": {
                "group_by_method": {
                  "terms": {
                    "field": "method.keyword",
                    "size": 10
                  },
                  "aggs": {
                    "baseline": {
                      "avg": {
                        "field": "execution_time"
                      }
                    },
                    "short": {
                      "range": {
                        "field": "@timestamp",
                        "ranges": [
                          {
                            "from": "now-1h",
                            "to": "now"
                          }
                        ]
                      },
                      "aggs": {
                        "current": {
                          "avg": {
                            "field": "execution_time"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }

when watcher execution less than 1 min, it can get correct result

"result": {
    "execution_time": "2020-10-21T02:05:00.446Z",
    "execution_duration": 46471,
    "input": {
      "type": "search",
      "status": "success",
      "payload": {
        "_shards": {
          "total": 72,
          "failed": 0,
          "successful": 72,
          "skipped": 0
        },
        "hits": {
          "hits": [],
          "total": 10000,
          "max_score": null
        },
        "took": 45935,
        "_clusters": {
          "total": 1,
          "successful": 1,
          "skipped": 0
        },
        "timed_out": false,
        "aggregations": {
          "group_by_route": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "doc_count": 571632628,
                "group_by_method": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
~~~

but when "execution_duration" larger than 1 min , I got result likes

"type": "timeout_exception",
"reason": "java.util.concurrent.TimeoutException: Timeout waiting for task."

 "result": {
    "execution_time": "2020-10-20T18:05:00.374Z",
    "execution_duration": 60000,
    "input": {
      "type": "search",
      "status": "failure",
      "error": {
        "root_cause": [
          {
            "type": "timeout_exception",
            "reason": "java.util.concurrent.TimeoutException: Timeout waiting for task."
          }
        ],
        "type": "timeout_exception",
        "reason": "java.util.concurrent.TimeoutException: Timeout waiting for task.",
        "caused_by": {
          "type": "timeout_exception",
          "reason": "Timeout waiting for task."
        }
      },
      "search": {
        "request": {
          "search_type": "query_then_fetch",
          "indices": [
            "phplog:phplog-*"
          ],
          "rest_total_hits_as_int": true,
          "body": {
            "size": 0,
            "query": {
              "bool": {
                "filter": [
                  {
                    "match": {
~~~

Is it possible to increase search timeout in watcher ?

thank you ! :slightly_smiling_face:

You can try setting a search timeout.

Is there another reason for a longer runtime that could be tuned? Like a cross cluster search across several data centers? Why is that query taking 45 seconds. OTOH I just saw that you are aggregating on more than 500 million documents in the first bucket - that might take some time.

I use remote clusters to search my index.

Is it cause elasticsearch take long time to search?

indeed! we aggregating multiple times in this watcher ...

Is it a bad idea?

          "aggs": {
            "group_by_route": {
              "terms": {
                "field": "route_format.keyword",
                "size": 999
              },
              "aggs": {
                "group_by_method": {
                  "terms": {
                    "field": "method.keyword",
                    "size": 10
                  },
                  "aggs": {
                    "baseline": {
                      "avg": {
                        "field": "execution_time"
                      }
                    },
                    "short": {
                      "range": {
                        "field": "@timestamp",
                        "ranges": [
                          {
                            "from": "now-1h",
                            "to": "now"
                          }
                        ]
                      },
                      "aggs": {
                        "current": {
                          "avg": {
                            "field": "execution_time"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }

Well, if that query answers your question, it is a good one, regardless how many documents you need to touch :slight_smile:

Can you share the full query? Is it covering time based indices? Maybe there is some room for optimization.

Sure!
by the way, I've try add search timeout 5s. but it is not timeout at 5 second :astonished:

{
  "trigger": {
    "schedule": {
      "hourly": {
        "minute": [
          5
        ]
      }
    }
  },
  "input": {
    "search": {
      "request": {
        "search_type": "query_then_fetch",
        "indices": [
          "phplog:phplog-*"
        ],
        "rest_total_hits_as_int": true,
        "body": {
          "size": 0,
          "timeout": "5s",
          "query": {
            "bool": {
              "filter": [
                {
                  "match": {
                    "type": "response_log"
                  }
                },
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-3d/d",
                      "to": "now"
                    }
                  }
                }
              ]
            }
          },
          "aggs": {
            "group_by_route": {
              "terms": {
                "field": "route_format.keyword",
                "size": 999
              },
              "aggs": {
                "group_by_method": {
                  "terms": {
                    "field": "method.keyword",
                    "size": 10
                  },
                  "aggs": {
                    "baseline": {
                      "avg": {
                        "field": "execution_time"
                      }
                    },
                    "short": {
                      "range": {
                        "field": "@timestamp",
                        "ranges": [
                          {
                            "from": "now-1h",
                            "to": "now"
                          }
                        ]
                      },
                      "aggs": {
                        "current": {
                          "avg": {
                            "field": "execution_time"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "script": {
      "source": "ctx.payload.aggregations.group_by_route.buckets.stream().map(r -> { ArrayList arr=new ArrayList(); for(int i = 0; i < r.group_by_method.buckets.length; i++){ def target = r.group_by_method.buckets.get(i); if(target.short.buckets[0].current.value != null && target.baseline.value != null && target.short.buckets[0].current.value > target.baseline.value*2 ){ arr.add(target); }} return [ 'route': r.key, 'buckets': arr]; }).filter(r -> { return r.buckets.length > 0 }).count() > 0",
      "lang": "painless"
    }
  },
  "actions": {
    "my_webhook": {
      "webhook": {
        "scheme": "https",
        "host": "api.telegram.org",
        "port": 443,
        "method": "get",
        "path": "/bot******:**********/sendMessage",
        "params": {
          "text": "@@@    回應速度變慢    @@@\r\n{{#ctx.payload.result}}API: {{route}}\r\n{{#buckets}}Method: {{key}}\r\n平均回應時間: {{baseline.value}}\r\n當前回應時間: {{short.buckets.0.current.value}}\r\n{{/buckets}}\r\n=====\r\n{{/ctx.payload.result}}\r\n",
          "chat_id": "-*****"
        },
        "headers": {},
        "proxy": {
          "host": "172.17.99.59",
          "port": 3128
        }
      }
    }
  },
  "transform": {
    "script": {
      "source": "return ['result': ctx.payload.aggregations.group_by_route.buckets.stream().map(r -> { ArrayList arr=new ArrayList(); for(int i = 0; i < r.group_by_method.buckets.length; i++){ def target = r.group_by_method.buckets.get(i); if(target.short.buckets[0].current.value != null && target.baseline.value != null && target.short.buckets[0].current.value > target.baseline.value*2 ){ arr.add(target); }} return [ 'route': r.key, 'buckets': arr]; }).filter(r -> { return r.buckets.length > 0 }).collect(Collectors.toList())]",
      "lang": "painless"
    }
  }
}

Hey,

you could play around with the shard level request cache and check if it is used. See https://www.elastic.co/guide/en/elasticsearch/reference/7.9/shard-request-cache.html - using the index stats you can check if the cache is already used https://www.elastic.co/guide/en/elasticsearch/reference/7.9/indices-stats.html

Also, you could also add some date math to your indices to make sure less shards are queried (even though I assume, that this will not be the issue)

You could also use the xpack profiler and check where most time is spent.

thanks your information, it's really helpful. :smiley:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.