Questions Regarding Sibling Pipeline Aggregation


#1

Greetings,

my question deals with processing of time series data using the sibling pipeline aggregation, e.g. for creating a line chart.

As first aggregation, I need to sum up buckets of different time series. The bucket size should always be defined by the original temporal interval (in my case 5 minutes for all time series). As second aggregation, I need to find the maximum value of an bucket.

If the bucket size equals 5 minutes, e.g. for a line chart with date histogram, the result is straightforward. If I need to zoom out, e.g. last 24 hours or last 7 days, the interval of the line chart changes to e.g. 30 minutes or 3 hours. What I need to get as a result for these intervals is the maximum value of the original 5 minute buckets within the broader 30 minutes or 3 hours buckets.
Instead, a summed value refering to 10 minutes or 1 hour is shown. The request also shows that the intended 5 minute buckets are not queried but instead a broader interval of 10 minutes or 1 hour. In order to depict this, I have added 2 screenshots covering the case for the last 7 days.


My questions are:

  1. Why is the interval automatically changed from 5 minutes to 1 hour if for example the last 7 days are shown?
  2. Is there a way to force Kibana to stick to the original interval selection made in the sibling pipeline aggregation?
  3. Why is there a hint/warning like "This interval creates too many buckets to show in the selected time range, so it has been scaled to hour" in Kibana version 6.2.4 but not in Kibana version 6.4.2? I have not yet checked the behavior for Kibana version 6.5.0.

Many thanks in advance.


(Spencer Alger) #2
  1. The interval scaling is done because of the metrics:max_buckets setting. In order to prevent pulling in too much data we scale the data when possible. The data should still be pretty accurate
  2. To force Kibana not to scale you can increase the metrics:max_buckets in management > advanced settings to something huge. Then scaling won't kick in until much later.
  3. That looks like a bug, I submitted https://github.com/elastic/kibana/issues/25982 to track it and hopefully get it fixed.

#3

Thanks for your quick reply. Now I have found time to go through your answers.

  1. I understand that you need to scale data whenever possible. Regarding your last sentence, in my case the results are actually not accurate anymore. Between a sum on a 5-min interval and a sum on a 1h-interval is a factor of 12. My use case is about getting the results based on summed 5-minute intervals for variable time ranges (hours, days, weeks, ...).
  2. Thanks for pointing this out. However, this setting does unfortunately not work once the webpage is reloaded, e.g. if the visualization gets saved in dashboard and this dashboard is opened. It is still the same behaviour as depicted above. I have set the value of metrics:max_buckets to a very high value like 1000000.
  3. Thanks for reporting this bug.

I would really appreciate it if you could also have a look at point 2. It would be interesting to see if you can reproduce this behaviour.

Again many thanks in advance.


#4

This issue affects me as well and I tried to fix it by increasing max_buckets, but it didn't help.
@spalger Can you give any further advice? That would be really great.
Thank you for your support!