I'm looking for an existing service/application which would execute scheduled queries defined by me and insert the result to an ES index.
My goal is to aggregate data to be able to extend key metrics retention time. This way I'd be able to tell trends over a longer period without having granular data which costs a lot of disk space (not to mention replicas and snapshots...).
Is there such service or do I need to write an app myself?
It's not clear to me that which components of the X-pack isn't available with free subscription so I usually don't even open anything related to X-pack since I can't afford that.
Thank you! I start to read and experiment with it right away
This message does not mean that we are going to remove the feature. When we want to remove something we are deprecating APIs which is not the case here.
This message says basically that:
If we find out that it's a dangerous feature, think about data integrity or things like that, we might say "that was a bad idea, sorry."
but most of the time (if not always) it just means that the API might be subject to change and that at this step we can not guarantee the compatibility between versions of this API.
When stable/mature enough (ie we do have lot of positive feedbacks or not so many issues with the API), we can remove the message.
I think the Rollup API will be able to solve the majority of the problems I face but there are things it lacks. Or more precisely it isn't in the scope of the API.
Let's say I have a million document every day and I'd like to see how is the TOP 10 of XY term changing over time. Even if it isn't a high-cardinality field I wouldn't want to create 3000 groups just to query those again.
Sometimes I need to know how many documents I had in a given time period and how many had the field "foo" in it, or the field "bar", etc.
Something like: Document_total_count: 1,000,000 foo_exists: 6,782 bar_exists: 442,332
So I still need a way to store the result of a query which is executed periodically.
Yeah, I'm not sure Rollup is a perfect fit here. If you only want the top 10 (or whatever), Rollup will be doing a lot more work since it has to exhaustively save all the groups. It may be a trivial amount of extra work depending on the size of the data, so you could experiment and see. But the design goal of Rollup is to take all your existing data and "downsample" it to a larger granularity, which means you'll be saving more than just the top n values.
A simpler way would be Watcher/Alerting (define a watch on a scheduled basis which includes the top n agg, save the result), but that's in Gold.
If Rollup doesn't work for you, I think you'll probably have to write a script that executes your query and transforms it into a document, then just schedule it with a cron or something.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.