Scheduled query service

YvorL · February 26, 2019, 9:44pm

Hi,

I'm looking for an existing service/application which would execute scheduled queries defined by me and insert the result to an ES index.
My goal is to aggregate data to be able to extend key metrics retention time. This way I'd be able to tell trends over a longer period without having granular data which costs a lot of disk space (not to mention replicas and snapshots...).
Is there such service or do I need to write an app myself?

Thanks!

warkolm · February 26, 2019, 9:44pm

Why not just use the rollup API?

YvorL · February 27, 2019, 11:20am

It's not clear to me that which components of the X-pack isn't available with free subscription so I usually don't even open anything related to X-pack since I can't afford that.

Thank you! I start to read and experiment with it right away

dadoonet · February 27, 2019, 11:46am

You can see that on that page: https://www.elastic.co/subscriptions

For example:

YvorL · February 27, 2019, 4:12pm

Thanks, I'll bookmark that page.

YvorL · February 27, 2019, 4:33pm

How likely will this (or any other with the warning message) be removed? And if it will be removed, usually how long is the deprecation period?

dadoonet · February 27, 2019, 4:47pm

Sorry? Which message?

YvorL · February 28, 2019, 10:39am

Sorry, I meant to include it:

dadoonet · February 28, 2019, 1:36pm

This message does not mean that we are going to remove the feature. When we want to remove something we are deprecating APIs which is not the case here.

This message says basically that:

If we find out that it's a dangerous feature, think about data integrity or things like that, we might say "that was a bad idea, sorry."
but most of the time (if not always) it just means that the API might be subject to change and that at this step we can not guarantee the compatibility between versions of this API.

When stable/mature enough (ie we do have lot of positive feedbacks or not so many issues with the API), we can remove the message.

I hope this makes sense.

YvorL · February 28, 2019, 4:18pm

Thanks, that's good to know!

I think the Rollup API will be able to solve the majority of the problems I face but there are things it lacks. Or more precisely it isn't in the scope of the API.

Let's say I have a million document every day and I'd like to see how is the TOP 10 of XY term changing over time. Even if it isn't a high-cardinality field I wouldn't want to create 3000 groups just to query those again.

Sometimes I need to know how many documents I had in a given time period and how many had the field "foo" in it, or the field "bar", etc.
Something like:
Document_total_count: 1,000,000
foo_exists: 6,782
bar_exists: 442,332

So I still need a way to store the result of a query which is executed periodically.

dadoonet · February 28, 2019, 4:34pm

May be @polyfractal has an idea?

MarkusT · March 1, 2019, 8:21am

Maybe you could try an exists filter aggregation, like

"aggs": {
  "results_with_field": { 
   "filter": { 
      "exists": { 
        "field": "name" 
      }  
   } 
  } 
}

and then get the cardinality.

YvorL · March 1, 2019, 10:10am

Within the Rollup API?

polyfractal · March 1, 2019, 4:35pm

Sorry for the delayed response everyone

Yeah, I'm not sure Rollup is a perfect fit here. If you only want the top 10 (or whatever), Rollup will be doing a lot more work since it has to exhaustively save all the groups. It may be a trivial amount of extra work depending on the size of the data, so you could experiment and see. But the design goal of Rollup is to take all your existing data and "downsample" it to a larger granularity, which means you'll be saving more than just the top n values.

A simpler way would be Watcher/Alerting (define a watch on a scheduled basis which includes the top n agg, save the result), but that's in Gold.

If Rollup doesn't work for you, I think you'll probably have to write a script that executes your query and transforms it into a document, then just schedule it with a cron or something.

YvorL · March 1, 2019, 4:55pm

Thank you!

system · March 29, 2019, 4:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Rollup API Elasticsearch	3	744	December 8, 2017
ElasticSearch 5.2.2 Retention Period - Clarification Elasticsearch	4	604	February 26, 2018
Granulated data - elasticsearch. Is it possible to convert minutely data to hourly data and store it as a new index? Elasticsearch	2	529	August 31, 2017
Data Aggregation (timeseries) Elasticsearch	3	334	July 23, 2018
Working on old Indices Elasticsearch	3	330	January 25, 2019

Scheduled query service

Related topics