Heartbeat data roll-up

Hi

I was wondering if there are any good practices around roll-up for data in Elasticsearch coming from Heartbeat or if anyone has experience with it.

Some background:
We're collecting uptime data from multiple systems and obviously the amount of data can get quite big (with monitors being set to collect data every 15 seconds). However, our needs for data granularity/resolution decreases in time, i.e. for the past 7 days a granularity of 15 seconds is good, but same is not true e.g. for data 1 month back (there e.g. buckets of 5 minute averages would be enough). For data even further back (e.g. 3 months in the past and older) buckets with 60-minute-averages would be enough.

The ideas so far:

  1. Heartbeat-Index contains contains raw data for the past 2 weeks
  2. A roll-up job aggregates data into 5-minute-buckets
  3. ILM takes care of deleting raw data older than 2 weeks
  4. Another roll-up job aggregates the roll-up index from point 2 above into another index with 60-minute-buckets
  5. ILM takes care of deleting rolled-up data from point 2

Questions:

  • Are 4. and 5. even possible?
  • Does the Uptime app in Kibana support roll-up indices?
  • How do you maintain your Heartbeat data? I'm struggling to get a reasonable setup...

Cheers,
Stefan

P.S. The entire Elastic environment is running on version 7.6.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Sorry we missed this, I've moved this to the Heartbeat forum which is more appropriate.

This is a great idea for a feature. There's nothing stopping you from doing rollups today, but the Uptime UI is not built to support rolled up data. You'd need to use custom dashboards with the rolled up data. We currently depend on the schema heartbeat sends.

That said, it's something that we'll probably want to add in the future. That said, it's not a frequent request. It'd help to have some

To answer your questions:

  1. I believe those are both possible.
  2. No it does not.
  3. It'd be great to know some of the parameters for your setup. We find that different people have different expectations for data usage, data fidelity, retention etc. If you could provide more detail here that'd be hugely useful for us to know as we choose which features to prioritize.

WRT what a good rollup job would look like. Off the top of my head, what I think you'd want to do is rollup the summary.up and summary.down fields and aggregate by monitor.id. Those count how many individual monitors were up/down for a given check. You could simply sum those within a bucket for a given monitor. You might also want an avg for monitor.duration.us to get the overall timing. Is that enough to get going on?

1 Like