Anomaly detection - partitioning data

mduzy · October 5, 2020, 9:18am

Hi,

I've been working with the anomaly detection functionality in Kibana. I've got about 500 records processed, all of them have a transaction.name and a custom id that works as an tenant indicator (I'll call it tenantId from now).
So I create a anomaly detection job, with these detectors:
count partitionfield="transaction.name"
count partitionfield="tenntId"
and I get it properly partitioned by all the values in these fields throughout the dataset. However when I try to use a by field, it becomes weird:
count by "tenantId" partitionfield="transaction.name"
count by "transaction.name" partitionfield="tenantId"
I get the choice to select a combination of tenantId - transaction.name, however the dropdowns (on single metric view) are somewhat lacking in data, I can choose like 1 tenantId and 1 transaction.name. Tried to use this a few times and can't really get around this...
What I'm trying to achieve here is to detect anomalies in each tenant and by API method calls because some tenants may be much more busy than others and some method calls may be more used than others.
So any ideas why I can't get a full set of possible tenantIds and transaction.names in those dropdowns?

richcollier · October 5, 2020, 11:31am

Please report the version you are using when you post questions. This is relevant specifically in this case because the behavior of the UI has changed with respect to this over time (see https://github.com/elastic/kibana/issues/52618 for example).

Also, adding screenshots to your posts is very helpful for us.

mduzy · October 5, 2020, 12:50pm

I'm working with version 7.9.2

richcollier · October 5, 2020, 3:04pm

Thanks for the info. Now, when you say "somewhat lacking data" - I need to know what you mean by that.

Because it is possible that each combination of transaction.name and tenantId results in sparse data just by its nature.

Are you sure you really need the double-split here? Why not just the single split (perhaps partitioning on transaction.name and just leaving tenantId as an influencer?

mduzy · October 6, 2020, 8:08am

So when data is sparse, the options may not show in the dropdowns? So maybe if all the tenants generate enough data it will show everything? Because it behaves awkwardly - when I do count by "tenantId" partitionfield="transaction.name" it returns only tenantId=2 and one transaction.name, and when I reverse it - count by "transaction.name" partitionfield="tenantId" I get tenantId=52 and a different transaction.name.

Unfortunately it's a requirement for the project I'm working on to have the double split here.

richcollier · October 6, 2020, 11:54am

I was cautioning about sparse data with respect to the modeling. If you "oversplit" the data, you may have a situation in which the unique combination of the by_field and the partition_field doesn't really occur very frequently, thus giving the possibility of a reduced number of observations and inadequate modeling.

In the UI, the dropdowns will only show entities that have anomaly records. For example, I just did a contrived job of count by request.keyword partitionfield=geo.src on the sample kibana web logs data set (there are 175+ unique request.keyword values and 160+ unique geo.src values). However, on execution of my job, only 7 anomalies are found in the data set, but all for the combination of geo.src:"CN" and request.keyword:"/beats/metricbeat". Therefore, the UI looks like:

In other words, the dropdown doesn't show any of the other values of geo-src or request.keyword

system · November 3, 2020, 11:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ML Kibana: problem with an advanced job using partitionfield Kibana elastic-stack-machine-learning	18	1139	September 3, 2019
ML anomaly detection question Kibana elastic-stack-machine-learning	8	622	February 11, 2020
ML: difference between partition_field_name and by_field_name? Elasticsearch elastic-stack-machine-learning	4	846	August 27, 2021
Anomaly Detection Job Results Index Kibana	3	30	August 28, 2024
Elastic Anomaly Detection not showing some partition fields in single metric viewer Elasticsearch elastic-stack-machine-learning	2	430	April 7, 2022

Anomaly detection - partitioning data

Related topics