Data frame vs datafeed vs rollup

liorg2 · January 5, 2020, 4:54pm

Hello

There are few components in the Elastic stack, which I'm not sure I understand the difference and when to use which.
for ML I use (currently) datafeed with pre-defined query (in practice it seems to aggregate metrics by a given time frame)
Rollup adds a functionality for aggregate raw indices and save query time/ space..
And in the latest version, after upgrading from 6.7 to 7.5, there is a new feature called 'data frames', which sounds similar..

So.. if my purpose is to aggregate data into smaller indices which can accept queries, and also run ML jobs. which should I choose, and why there are few features with similar functionality?

thanks
Lior

edsavage · January 6, 2020, 2:57pm

Hi Lior,

All good questions! I will do my best to answer them in detail. Please bear with me while I formulate my response (with a little help from my team).

Kind Regards,

Ed

grabowskit · January 7, 2020, 7:37pm

Lior,

I believe what you want is Transforms which allow you to convert existing Elasticsearch indices into summarized indices.

The other functionality you mention is very use case specific and probably won't do what you need based on your stated purpose. Datafeeds are only used for feeding data into ML anomaly detection jobs (and don't create other indices). Rollups are used to aggregate metric indices to reduce storage, but have a special _search endpoint that allows you to query across raw and summarized metric data.

Thanks for the question,

TomG

liorg2 · January 7, 2020, 10:19pm

Thanks a lot!

The issue with transforms (for me), is that I use percentiles aggregation often, and it seems to be missing from current supported aggregations.

Mark_Harwood · January 8, 2020, 11:08am

On this specific point - roll ups are a form of compaction that is geared specifically towards grouping documents based on time units while transforms are typically grouping documents on a choice of entity like a customer ID.

A web session is an example of an entity that can span time units so time-based roll-ups are not an appropriate mechanism for grouping that information.
A monthly roll-up is not grouped on a single entity key (it's a range of timestamps) so transforms are not appropriate.

What you might want to summarise for entities vs time-groups might be similar (counts, flags etc) but the unit that you group things around is fundamentally different.

liorg2 · January 9, 2020, 7:46am

Do you guys know if percentiles are in the road map of transforms?
thanks

@Mark_Harwood
@grabowskit
@edsavage

sophie_chang · January 9, 2020, 9:43am

Yes, percentiles are on our roadmap including handling for functions that return multiple values. It is something we are keen to do, however we do not have committed timeframes for this yet.

system · February 6, 2020, 9:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Rollup vs Continuous Transform for historical data Elasticsearch	2	2265	August 4, 2020
Rollup vs Transform Elasticsearch transforms , rollups	2	282	September 11, 2023
Rollup Granularity and Aggregation Elasticsearch	1	489	March 12, 2019
Rollup data in ES Elasticsearch	3	1627	July 6, 2017
Rollup strategy in Elastic Elasticsearch	1	796	December 11, 2017

Data frame vs datafeed vs rollup

Related topics