Heavy Query (SubAggregation + Nested Aggregation) performance optimization issue with Vega in Kibana

Hi. Im using ElasticSearch / Kibana 6.4.1
and I'm using built-in vega(v3, not vega-lite) in Kibana.

I'm developing LogMonitoring System for looking trend of performance.

there are multiple WAS, and these WAS are calculating performances and submit the result Logs to LogServer. (DB, Cache, Request/Response, Callstack etc )

and I have 1 LogServer with EFK (ElasticSearch, Fluentd, Kibana).

following is LogServer's spec. this server is on TEST environment. not on PRODUCTION environment.
CPU : 8 Cores / RAM : 64 GB

I'm drawing performance-trend-grpah via vega for interacting graphs each other and for advanced customized graphs.

in vega, Im requesting following query to ElasticSearch.

following is my Heavy Query.

 {
  "size": 0,
  "query": {
    "bool": {
      "filter": {
        "range": {
          "date": {
            "gte": "now-2d/d",
            "lte": "now+1d/d"
          }
        }
      }
    }
  },
  "aggs": {
    "timeByDate": {
      "date_histogram": {
        "field": "date",
        "interval": "day",
        "format": "yyyy-MM-dd"
      },
      "aggs": {
        "controllers": {
          "terms": {
            "field": "controller"
          },
          "aggs": {
            "result": {
              "nested": {
                "path": "profiles"
              },
              "aggs": {
                "profiles": {
                  "filter": {
                    "bool": {
                      "must": [
                        {
                          "term": {
                            "profiles.key": "profile_total"
                          }
                        }
                      ]
                    }
                  },
                  "aggs": {
                    "result": {
                      "terms": {
                        "field": "profiles.key"
                      },
                      "aggs": {
                        "delay": {
                          "avg": {
                            "field": "profiles.delay"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "timeByMinute": {
      "date_histogram": {
        "field": "date",
        "interval": "minute",
        "format": "yyyy-MM-dd HH:mm"
      },
      "aggs": {
        "controllers": {
          "terms": {
            "field": "controller"
          },
          "aggs": {
            "result": {
              "nested": {
                "path": "profiles"
              },
              "aggs": {
                "profiles": {
                  "terms": {
                    "field": "profiles.key"
                  },
                  "aggs": {
                    "delay": {
                      "avg": {
                        "field": "profiles.delay"
                      }
                    },
                    "methods": {
                      "terms": {
                        "field": "profiles.method"
                      },
                      "aggs": {
                        "delay": {
                          "avg": {
                            "field": "profiles.delay"
                          }
                        },
                        "contents": {
                          "terms": {
                            "field": "profiles.content"
                          },
                          "aggs": {
                            "contentDelay": {
                              "avg": {
                                "field": "profiles.delay"
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "controllers": {
      "terms": {
        "field": "controller"
      },
      "aggs": {
        "times": {
          "date_histogram": {
            "field": "date",
            "interval": "minute",
            "format": "yyyy-MM-dd HH:mm"
          },
          "aggs": {
            "result": {
              "nested": {
                "path": "profiles"
              },
              "aggs": {
                "profile": {
                  "filter": {
                    "bool": {
                      "must": [
                        {
                          "term": {
                            "profiles.key": "profile_total"
                          }
                        }
                      ]
                    }
                  },
                  "aggs": {
                    "delay": {
                      "avg": {
                        "field": "profiles.delay"
                      }
                    }
                  }
                }
              }
            }
          }
        },
        "currentKey": {
          "max": {
            "field": "date"
          }
        }
      }
    }
  }
}

with some test logs (alomost 600MB), above logs are doesn't working.
but in PRODUCTION environment, logs size will be xxxGB (maybe).

'Too Many Buckets(10001)..' Error occured.
and timeout (30sec).

even error and timeout has not occured, it is too slow to using.

so i must refactoring this big query.

i know 'Scroll API' for pagination and performance.

but in vega maybe can't using this.
and vega doesn't support request parameter with variable that i clicked component's value
(like using SQL's WHERE Statement) when i query to elasticsearch. (not correct, but maybe right)

that's why im using heavy query like above. (for querying all of datas that i need in one queue)

i can separate 3 aggs (timeByDate, timeByMinute, controllers).

but in 'timeByMinute' can not using. tooooo slow.

i must using aggregate feature for my graph.

how can i optimizing this issue?

should i develop my own web application? so i must use Scroll API or Caching?
(but this is hard to drawing graph and apply event listener)

I've been thinking about this for a week. but no idea at all.

what i thought

  1. i don't need 'hits'. so, i configed 'query's size 0
    -> as i said, vega cant using scroll api and logs count is more than 10000.
  2. i saw that 'query's filter is faster generally.
  3. pre-processing the aggregated result logs that i actually need logs for drawing graphs from raw logs data.(pre-proccessing result in backend before request) when i showing these data via vega, scanning only aggregated result logs datas.
    -> in this case, raw logs index will be different from aggregated result logs index

can you give me any ideas?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.