ML Kibana: difference between by_field_name and partition_field_name

richcollier · August 1, 2019, 7:30pm

Yes the numbers are more of an "order of magnitude" estimate. You can certainly get jobs with 100,000+ partitions if you're willing to have the memory headroom. However, keep in mind that 1 job is tied to 1 ML node, so you'll never get horizontal scalability if you just have 1 massive job instead of many smaller jobs.
In general, there is a concept in the ML job as to when a thing first happens - which I'll call the "dawn of time". When the dawn of time of something happens (i.e. the first time the ML job sees data for host=X or error_code=Y) there may be one of two situations:

That new entity is seen as "novel" and that, in itself, is notable and potentially worthy of being flagged as anomalous. To do that, you need to have your "dawn of time" be when the job starts.
That new entity is just part of the normal "expansion" of the data - perhaps a new server was added to the mix or a new product_id was added to the catalog. In this case, just start modeling that new entity and don't make a fuss about it showing up - and to do that, you need to have the "dawn of time" be when that entity first shows up

When analyzing split using by_field_name , the dawn of time is when the ML job was started and when split using partition_field_name , then dawn of time is when that partition first showed up in the data. As such, you will get different results if you split one way versus the other for a situation in which something "new" comes along.

Topic		Replies	Views
ML: difference between partition_field_name and by_field_name? Elasticsearch elastic-stack-machine-learning	4	882	August 27, 2021
ML What is the difference between by_field_name and partition_field_name Elasticsearch elastic-stack-machine-learning	2	2452	December 27, 2017
Can you set partition field and count by as the same field? Kibana elastic-stack-machine-learning	3	413	December 14, 2022
ML: difference between partition_field_name and by_field_name in a population job? Elasticsearch elastic-stack-machine-learning	9	1463	December 7, 2021
Kibana - Splitting series by field names Kibana	3	1834	July 23, 2021

ML Kibana: difference between by_field_name and partition_field_name

Related topics