Data frame vs datafeed vs rollup

Hello

There are few components in the Elastic stack, which I'm not sure I understand the difference and when to use which.
for ML I use (currently) datafeed with pre-defined query (in practice it seems to aggregate metrics by a given time frame)
Rollup adds a functionality for aggregate raw indices and save query time/ space..
And in the latest version, after upgrading from 6.7 to 7.5, there is a new feature called 'data frames', which sounds similar..

So.. if my purpose is to aggregate data into smaller indices which can accept queries, and also run ML jobs. which should I choose, and why there are few features with similar functionality?

thanks
Lior

Hi Lior,

All good questions! I will do my best to answer them in detail. Please bear with me while I formulate my response (with a little help from my team).

Kind Regards,

Ed

Lior,

I believe what you want is Transforms which allow you to convert existing Elasticsearch indices into summarized indices.

The other functionality you mention is very use case specific and probably won't do what you need based on your stated purpose. Datafeeds are only used for feeding data into ML anomaly detection jobs (and don't create other indices). Rollups are used to aggregate metric indices to reduce storage, but have a special _search endpoint that allows you to query across raw and summarized metric data.

Thanks for the question,

TomG

Thanks a lot!

The issue with transforms (for me), is that I use percentiles aggregation often, and it seems to be missing from current supported aggregations.

On this specific point - roll ups are a form of compaction that is geared specifically towards grouping documents based on time units while transforms are typically grouping documents on a choice of entity like a customer ID.

A web session is an example of an entity that can span time units so time-based roll-ups are not an appropriate mechanism for grouping that information.
A monthly roll-up is not grouped on a single entity key (it's a range of timestamps) so transforms are not appropriate.

What you might want to summarise for entities vs time-groups might be similar (counts, flags etc) but the unit that you group things around is fundamentally different.

Do you guys know if percentiles are in the road map of transforms?
thanks

@Mark_Harwood
@grabowskit
@edsavage

Yes, percentiles are on our roadmap including handling for functions that return multiple values. It is something we are keen to do, however we do not have committed timeframes for this yet.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.