Could you please explane me the baseline calculation algorithm.
I have a few questions.
Is that something like mean value plus and minus standard deviation multiplied by some coefficient (3, for example)?
Which types of seasonality are used - daily (for each hour), weekly(for each day of the week)?
If we have collected timeseries data over a long period of time (months, years), what time period is used for the baseline calculation?
What is the frequency and at what point baselines are recalculated? (Weekly, daily, maybe recalculations start at the moments when new documents are written to indices?)
Is it possible to customise the baseline calculation algorithm? For example, to change the coefficient?
And one question related to analysis functions.
Is it possible to add and use custom functions to analyze data?
Is that something like mean value plus and minus standard deviation multiplied by some coefficient (3, for example)?
Yes, and no- a simple +/- 3 std dev. is a simplified version of what we do. We do not assume a Gaussian distribution (which std. deviation does) - but rather use ML techniques to find the best probability distribution model for the data
Which types of seasonality are used - daily (for each hour), weekly(for each day of the week)?
Daily and weekly for sure. We also look for other periodic frequencies that don't fall on the typical daily/weekly/etc. boundaries
If we have collected timeseries data over a long period of time (months, years), what time period is used for the baseline calculation?
All data is used as the learning is continuous
What is the frequency and at what point baselines are recalculated? (Weekly, daily, maybe recalculations start at the moments when new documents are written to indices?)
Every bucket_span's worth of data affects the modeling
Is it possible to customise the baseline calculation algorithm? For example, to change the coefficient?
No, not at this time
Is it possible to add and use custom functions to analyze data?
No, however you can customize the query filters/aggegations/etc. that is used by the Datafeed before feeding the data to ML. In that way, you have a certain level of customization
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.