Scheduled query service

Hi,

I'm looking for an existing service/application which would execute scheduled queries defined by me and insert the result to an ES index.
My goal is to aggregate data to be able to extend key metrics retention time. This way I'd be able to tell trends over a longer period without having granular data which costs a lot of disk space (not to mention replicas and snapshots...).
Is there such service or do I need to write an app myself?

Thanks!

Why not just use the rollup API?

2 Likes

It's not clear to me that which components of the X-pack isn't available with free subscription so I usually don't even open anything related to X-pack since I can't afford that.

Thank you! I start to read and experiment with it right away :slight_smile:

You can see that on that page: https://www.elastic.co/subscriptions

For example:

Thanks, I'll bookmark that page.

How likely will this (or any other with the warning message) be removed? And if it will be removed, usually how long is the deprecation period?

Sorry? Which message?

Sorry, I meant to include it:

This message does not mean that we are going to remove the feature. When we want to remove something we are deprecating APIs which is not the case here.

This message says basically that:

  • If we find out that it's a dangerous feature, think about data integrity or things like that, we might say "that was a bad idea, sorry."
  • but most of the time (if not always) it just means that the API might be subject to change and that at this step we can not guarantee the compatibility between versions of this API.

When stable/mature enough (ie we do have lot of positive feedbacks or not so many issues with the API), we can remove the message.

I hope this makes sense.

1 Like

Thanks, that's good to know!

I think the Rollup API will be able to solve the majority of the problems I face but there are things it lacks. Or more precisely it isn't in the scope of the API.

Let's say I have a million document every day and I'd like to see how is the TOP 10 of XY term changing over time. Even if it isn't a high-cardinality field I wouldn't want to create 3000 groups just to query those again.

Sometimes I need to know how many documents I had in a given time period and how many had the field "foo" in it, or the field "bar", etc.
Something like:
Document_total_count: 1,000,000
foo_exists: 6,782
bar_exists: 442,332

So I still need a way to store the result of a query which is executed periodically.

May be @polyfractal has an idea?

Maybe you could try an exists filter aggregation, like

"aggs": {
  "results_with_field": { 
   "filter": { 
      "exists": { 
        "field": "name" 
      }  
   } 
  } 
}

and then get the cardinality.

Within the Rollup API?

Sorry for the delayed response everyone :slight_smile:

Yeah, I'm not sure Rollup is a perfect fit here. If you only want the top 10 (or whatever), Rollup will be doing a lot more work since it has to exhaustively save all the groups. It may be a trivial amount of extra work depending on the size of the data, so you could experiment and see. But the design goal of Rollup is to take all your existing data and "downsample" it to a larger granularity, which means you'll be saving more than just the top n values.

A simpler way would be Watcher/Alerting (define a watch on a scheduled basis which includes the top n agg, save the result), but that's in Gold. :confused:

If Rollup doesn't work for you, I think you'll probably have to write a script that executes your query and transforms it into a document, then just schedule it with a cron or something.

1 Like

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.