Create machine learning job for trend analysis

elk2 · March 16, 2020, 9:33pm

I have a lot of data into elasticsearch with the tickets solded for men, women and children every day like the following:

{
  "_index": "logstash-people-2020.01.22",
  "_type": "_doc",
  "_id": "I7xKznAB6obTCirlmSMC",
  "_version": 1,
  "_score": null,
  "_source": {
    "men": 111,
    "@timestamp": "2020-01-22T12:00:00.000Z",
    "path": "/tmp/people.json",
    "@version": "1",
    "tags": [
      "multiline"
    ],
    "country": "US",
    "children": 127,
    "host": "ubuntu",
    "message": "{ \"date\": \"2020-01-22 13:00:00\",\n\"country\": \"US\",\n\"men\": 111,\n\"women\": 26,\n\"childrens\": 127\n}",
    "women": 26,
    "date": "2020-01-22 13:00:00"
  },
  "fields": {
    "@timestamp": [
      "2020-01-22T12:00:00.000Z"
    ]
  },
  "sort": [
    1579694400000
  ]
}

I would like to create with machine learning a sales forecast for the next week using dataset available into elasticsearch. How can I do?

rashmi · March 17, 2020, 4:30am

Yes, you can try using Machine learning - create a job and detext

If you have a basic license, you can use the Data Visualizer to learn more about your data. In particular, if your data is stored in Elasticsearch and contains a time field, you can use the Data Visualizer to identify possible fields for anomaly detection.
You can also upload a CSV, NDJSON, or log file (up to 100 MB in size). The Data Visualizer identifies the file format and field mappings. You can then optionally import that data into an Elasticsearch index.

You need the following permissions to use the Data Visualizer with file upload:

cluster privileges: monitor , manage_ingest_pipelines
index privileges: read , manage , index

More information can be found here: https://www.elastic.co/guide/en/kibana/current/xpack-ml.html

The Elastic machine learning anomaly detection feature automatically models the normal behavior of your time series data — learning trends, periodicity, and more — in real time to identify anomalies, streamline root cause analysis, and reduce false positives.

cc @Peter_Harverson can plz shed more light here.

Thanks
Rashmi

Peter_Harverson · March 17, 2020, 10:07am

A good place to start would be to create some single metric and multi metric anomaly detection jobs using the ML UI in Kibana - see this page in the docs for an overview. For example, creating three single metric jobs to model the ticket sales for men, women and children, using sum(men), sum(women) and sum(children). Or you could combine these into a single job using the multi metric wizard, with three detectors sum(men), sum(women) and sum(children). A bucket span of 1 day would seem suitable, although this would depend on how far back in time your data set goes.

Having run these jobs to learn the trends in your data, you can then run a forecast for the following week from the Single Metric wizard. This blog discusses the forecasting functionality in more detail.

I see there is also a country field in your data, so you may want to explore the affect of this variable, as well as the time factor, if your sales data is split across different countries. For this, you could try using a regression analysis to investigate the relationship between different fields in your sales data.

Hope that helps.
Pete

system · April 14, 2020, 10:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help for select the best solution for data analyse Elasticsearch es-hadoop	2	1027	July 6, 2017
Forecast data Elasticsearch elastic-stack-machine-learning	3	778	June 7, 2017
Machine Learning on Web server logs Elasticsearch elastic-stack-machine-learning	9	990	July 11, 2019
Problems with importing data for forecasting Kibana elastic-stack-machine-learning	16	1166	April 25, 2019
Machine learning for Customer 360 Elasticsearch elastic-stack-machine-learning	5	539	March 28, 2019

Create machine learning job for trend analysis

Related topics