Heavy Query (SubAggregation + Nested Aggregation) performance optimization issue with Vega in Kibana

teemoGod · January 24, 2019, 12:02pm

Hi. Im using ElasticSearch / Kibana 6.4.1
and I'm using built-in vega(v3, not vega-lite) in Kibana.

I'm developing LogMonitoring System for looking trend of performance.

there are multiple WAS, and these WAS are calculating performances and submit the result Logs to LogServer. (DB, Cache, Request/Response, Callstack etc )

and I have 1 LogServer with EFK (ElasticSearch, Fluentd, Kibana).

following is LogServer's spec. this server is on TEST environment. not on PRODUCTION environment.
CPU : 8 Cores / RAM : 64 GB

I'm drawing performance-trend-grpah via vega for interacting graphs each other and for advanced customized graphs.

in vega, Im requesting following query to ElasticSearch.

following is my Heavy Query.

 {
  "size": 0,
  "query": {
    "bool": {
      "filter": {
        "range": {
          "date": {
            "gte": "now-2d/d",
            "lte": "now+1d/d"
          }
        }
      }
    }
  },
  "aggs": {
    "timeByDate": {
      "date_histogram": {
        "field": "date",
        "interval": "day",
        "format": "yyyy-MM-dd"
      },
      "aggs": {
        "controllers": {
          "terms": {
            "field": "controller"
          },
          "aggs": {
            "result": {
              "nested": {
                "path": "profiles"
              },
              "aggs": {
                "profiles": {
                  "filter": {
                    "bool": {
                      "must": [
                        {
                          "term": {
                            "profiles.key": "profile_total"
                          }
                        }
                      ]
                    }
                  },
                  "aggs": {
                    "result": {
                      "terms": {
                        "field": "profiles.key"
                      },
                      "aggs": {
                        "delay": {
                          "avg": {
                            "field": "profiles.delay"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "timeByMinute": {
      "date_histogram": {
        "field": "date",
        "interval": "minute",
        "format": "yyyy-MM-dd HH:mm"
      },
      "aggs": {
        "controllers": {
          "terms": {
            "field": "controller"
          },
          "aggs": {
            "result": {
              "nested": {
                "path": "profiles"
              },
              "aggs": {
                "profiles": {
                  "terms": {
                    "field": "profiles.key"
                  },
                  "aggs": {
                    "delay": {
                      "avg": {
                        "field": "profiles.delay"
                      }
                    },
                    "methods": {
                      "terms": {
                        "field": "profiles.method"
                      },
                      "aggs": {
                        "delay": {
                          "avg": {
                            "field": "profiles.delay"
                          }
                        },
                        "contents": {
                          "terms": {
                            "field": "profiles.content"
                          },
                          "aggs": {
                            "contentDelay": {
                              "avg": {
                                "field": "profiles.delay"
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "controllers": {
      "terms": {
        "field": "controller"
      },
      "aggs": {
        "times": {
          "date_histogram": {
            "field": "date",
            "interval": "minute",
            "format": "yyyy-MM-dd HH:mm"
          },
          "aggs": {
            "result": {
              "nested": {
                "path": "profiles"
              },
              "aggs": {
                "profile": {
                  "filter": {
                    "bool": {
                      "must": [
                        {
                          "term": {
                            "profiles.key": "profile_total"
                          }
                        }
                      ]
                    }
                  },
                  "aggs": {
                    "delay": {
                      "avg": {
                        "field": "profiles.delay"
                      }
                    }
                  }
                }
              }
            }
          }
        },
        "currentKey": {
          "max": {
            "field": "date"
          }
        }
      }
    }
  }
}

with some test logs (alomost 600MB), above logs are doesn't working.
but in PRODUCTION environment, logs size will be xxxGB (maybe).

'Too Many Buckets(10001)..' Error occured.
and timeout (30sec).

even error and timeout has not occured, it is too slow to using.

so i must refactoring this big query.

i know 'Scroll API' for pagination and performance.

but in vega maybe can't using this.
and vega doesn't support request parameter with variable that i clicked component's value
(like using SQL's WHERE Statement) when i query to elasticsearch. (not correct, but maybe right)

that's why im using heavy query like above. (for querying all of datas that i need in one queue)

i can separate 3 aggs (timeByDate, timeByMinute, controllers).

but in 'timeByMinute' can not using. tooooo slow.

i must using aggregate feature for my graph.

how can i optimizing this issue?

should i develop my own web application? so i must use Scroll API or Caching?
(but this is hard to drawing graph and apply event listener)

I've been thinking about this for a week. but no idea at all.

what i thought

i don't need 'hits'. so, i configed 'query's size 0
-> as i said, vega cant using scroll api and logs count is more than 10000.
i saw that 'query's filter is faster generally.
pre-processing the aggregated result logs that i actually need logs for drawing graphs from raw logs data.(pre-proccessing result in backend before request) when i showing these data via vega, scanning only aggregated result logs datas.
-> in this case, raw logs index will be different from aggregated result logs index

can you give me any ideas?

system · February 21, 2019, 12:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is Vega performance good on large dataset? Kibana vega	7	550	January 9, 2024
Vega Visualization query Kibana vega	4	3640	January 6, 2020
Can i using 'Scroll API' to elasticsearch with vega in kibana? Kibana	3	899	February 22, 2019
How To Improve Kibana Performance Kibana	4	1612	November 15, 2019
How can I speed up Kibana aggregation? Kibana	10	5519	July 6, 2017

Heavy Query (SubAggregation + Nested Aggregation) performance optimization issue with Vega in Kibana

Related topics