Problems with importing data for forecasting

dnv007 · March 24, 2019, 11:53am

Hi All,

Super new to ELK. Have set up two VM's on cloud with elasticsearch and Kibana running respectively.

I am experimenting with the dashboard and imported an excel sheet which has mysql logs.

Was trying to perform Predictive analysis however i encounter the error: Anomaly detection can only be run over indices which are time based.

Could someone please help me understand what i am lacking or what kind of data is needed for Predictive analysis or if there are any links i have missed?

Any help deeply appreciated.

Thanks in advance!!

richcollier · March 25, 2019, 3:03pm

Hello,

Yes, your log lines need to have (at the bare minimum) a timestamp and a log message. The timestamp must be properly parsed and converted and stored into elasticsearch. In general, the CSV import facility inside of ML can handle this for most timestamp formats. You can see from the docsthat the following date formats are supported:

dd/MMM/YYYY:HH:mm:ss Z
EEE MMM dd HH:mm zzz YYYY
EEE MMM dd HH:mm:ss YYYY
EEE MMM dd HH:mm:ss zzz YYYY
EEE MMM dd YYYY HH:mm zzz
EEE MMM dd YYYY HH:mm:ss zzz
EEE, dd MMM YYYY HH:mm Z
EEE, dd MMM YYYY HH:mm ZZ
EEE, dd MMM YYYY HH:mm:ss Z
EEE, dd MMM YYYY HH:mm:ss ZZ
ISO8601
MMM d HH:mm:ss
MMM d HH:mm:ss,SSS
MMM d YYYY HH:mm:ss
MMM dd HH:mm:ss
MMM dd HH:mm:ss,SSS
MMM dd YYYY HH:mm:ss
MMM dd, YYYY h:mm:ss a
TAI64N
UNIX
UNIX_MS
YYYY-MM-dd HH:mm:ss
YYYY-MM-dd HH:mm:ss,SSS
YYYY-MM-dd HH:mm:ss,SSS Z
YYYY-MM-dd HH:mm:ss,SSSZ
YYYY-MM-dd HH:mm:ss,SSSZZ
YYYY-MM-dd HH:mm:ssZ
YYYY-MM-dd HH:mm:ssZZ
YYYYMMddHHmmss

Maybe you can post a few example lines from your CSV so that we can see what's up...

dnv007 · March 26, 2019, 10:48am

Thank you Richcollier!

I have imported a .csv extension Excel sheet and want to attempt Forecasting/Predictive Analysis on it.

Here is the sample data:

Time	SQL Server Memory Usage (MB)	SQL Server Locked Pages Allocation (MB)	SQL Server Large Pages Allocation (MB)	page_fault_count	memory_utilization_percentage	available_commit_limit_kb	process_physical_memory_low	process_virtual_memory_low
Time	SQL Server Memory Usage (MB)	SQL Server Locked Pages Allocation (MB)	SQL Server Large Pages Allocation (MB)	page_fault_count	memory_utilization_percentage	available_commit_limit_kb	process_physical_memory_low	process_virtual_memory_low
---	---	---	---	---	---	---	---	---
2019-03-15:16:27:00 PM	24667	0	0	50298367	100	81667232	0	0
2019-03-15:16:27:00 PM	24667	0	0	50298367	100	81667232	0	0
2019-03-15:16:27:00 PM	24667	0	0	50298367	100	81667232	0	0
2019-03-15:16:27:00 PM	24667	0	0	50298367	100	81667232	0	0

Please let me know if this minimal amount of data works? also what are the requisites for creating an Alert? i.e should it only be real time? I ask this because it said it requires a time field for creating one.

Thanks again.. awaiting your quick response.

richcollier · March 26, 2019, 2:04pm

The problem seems to be with your timestamp format. In the above, the Time field looks like:

2019-03-15:16:27:00 PM

If instead it looked like:

2019-03-15 16:27:00

(no extra : between the 15 and 16 and no PM on the end) then the data would be ingested properly and the Time field would automatically recognized:

dnv007 · March 26, 2019, 3:37pm

Thanks so much!

I did manage to get the time field in and create a job as well; However, when i run a Forecast on it i get this error: [status_exception] Cannot run forecast: Forecast cannot be executed as job requires data to have been processed and modeled

The data i used is below:

Time	SQL Server Memory Usage (MB)	page_fault_count	memory_utilization_percentage	available_commit_limit_kb
2019-02-15T05:01:55.000+0000	24667	50298367	100	81667232
2019-02-15T05:01:45.000+0000	24668	50298367	101	81667232
2019-02-15T00:02:41.000+0000	24663	50298367	103	81667232
2019-02-15T00:02:31.000+0000	24669	50298367	104	81667232

Job messages are:

Could you please help?

Thanks again!

richcollier · March 26, 2019, 3:59pm

Reading the error messages, it seems as if you're having problems with two, necessary indices that ML relies on to operate - .ml-state and .ml-anomalies-shared

What do you see if you go to Index management for these indices? For example:

dnv007 · March 26, 2019, 4:07pm

Here is what you asked for:

richcollier · March 26, 2019, 4:18pm

Ok, well it seems as if those indices are actually OK, despite what the error log indicated.

Can you also post the output of this API call against your job config:

GET _xpack/ml/anomaly_detectors/highsummemory/_stats

dnv007 · March 26, 2019, 4:23pm

Could you please guide me on how to do that? Sorry still new..

richcollier · March 26, 2019, 4:26pm

In Kibana, go to "Dev Tools" on the left menu, and then make sure "Console" is selected at the top. Paste in the text from the previous message, then click the arrow:

Paste the return contents from the right hand side back here

dnv007 · March 26, 2019, 4:43pm

Here it is:

{
  "count" : 1,
  "jobs" : [
    {
      "job_id" : "highsummemory",
      "data_counts" : {
        "job_id" : "highsummemory",
        "processed_record_count" : 0,
        "processed_field_count" : 0,
        "input_bytes" : 0,
        "input_field_count" : 0,
        "invalid_date_count" : 0,
        "missing_field_count" : 0,
        "out_of_order_timestamp_count" : 0,
        "empty_bucket_count" : 0,
        "sparse_bucket_count" : 0,
        "bucket_count" : 0,
        "input_record_count" : 0
      },
      "model_size_stats" : {
        "job_id" : "highsummemory",
        "result_type" : "model_size_stats",
        "model_bytes" : 0,
        "total_by_field_count" : 0,
        "total_over_field_count" : 0,
        "total_partition_field_count" : 0,
        "bucket_allocation_failures_count" : 0,
        "memory_status" : "ok",
        "log_time" : 1553618543184
      },
      "forecasts_stats" : {
        "total" : 4,
        "forecasted_jobs" : 1,
        "memory_bytes" : {
          "total" : 0.0,
          "min" : 0.0,
          "avg" : 0.0,
          "max" : 0.0
        },
        "records" : {
          "total" : 0.0,
          "min" : 0.0,
          "avg" : 0.0,
          "max" : 0.0
        },
        "processing_time_ms" : {
          "total" : 0.0,
          "min" : 0.0,
          "avg" : 0.0,
          "max" : 0.0
        },
        "status" : {
          "failed" : 4
        }
      },
      "state" : "opened",
      "node" : {
        "id" : "VW-G7JA1RqCDn_QOZJpF7w",
        "name" : "VW-G7JA",
        "ephemeral_id" : "aKFtymwoRieUFhLHbjWMaA",
        "transport_address" : "172.31.28.20:9300",
        "attributes" : {
          "ml.machine_memory" : "1031708672",
          "xpack.installed" : "true",
          "ml.max_open_jobs" : "20",
          "ml.enabled" : "true"
        }
      },
      "assignment_explanation" : "",
      "open_time" : "21s"
    }
  ]
}

richcollier · March 26, 2019, 5:45pm

I'm a little confused - you may need to describe the steps you've taken so far. It looks to me as if you've attempted to run a forecast for a job without having that job first learn on historical data.

Perhaps if you can describe the steps you've taken so far we can figure out what you are having issues with...

dnv007 · March 27, 2019, 4:41am

I might have missed on something... Could you elaborate on your sentence: "attempted to run a forecast for a job without having that job first learn on historical data"

I am trying to perform the Forecast on just the 4 lines of logs are you saying that is the problem?

I did read that a minimum of 3 weeks of data is required to forecast..

Please shed some more light on the same.

Thank you for your patience on this. Deeply Appreciated!!

richcollier · March 27, 2019, 1:25pm

Forecasting is ONLY possible if the ML job has already been run on a sufficient amount of historical data. This is required because forecasting leverages the models from the anomaly detection execution. More information is found here: https://www.elastic.co/blog/elasticsearch-machine-learning-on-demand-forecasting

Also, you cannot do anything (anomaly detection, or forecasting) adequately with only 4 samples. You need a continuous time-series of many samples (at least a few hundred and at least a few hours at the BARE minimum). A good rule of thumb is about 3 weeks or more worth of data.

Also, you should only attempt to forecast out an amount of time equal to or less than the amount of data that has been learned. So, it would not be advisable to attempt to forecast 8 weeks into the future if only 1 week has ever been seen.

dnv007 · March 28, 2019, 1:38pm

Thank you so much for your prompt and immediate responses!

Also, just out of curiosity would it be possible to input an hour of data and predict something for the next 15 or 5 mins?

Is it possible to create alerts with this few lines of code? or does that have to fullfill certain requirements?

richcollier · March 28, 2019, 4:46pm

It would depend on the number of samples in that hour (it would need to be hundreds). So, in practical terms, you probably don't have that, thus it would not be enough. I'm not sure why there's a hesitation on getting more data (like week's worth). Is that impractical for you?

Is it possible to create alerts with this few lines of code? or does that have to fullfill certain requirements?

Not sure what you are referring to by "this few lines of code". But, in general, yes you can create alerts from anomaly detection and/or forecasting results from ML. The information is stored back into an ES index that you can alert on via Watcher (X-Pack Alerting): Alerting on Machine Learning Jobs in Elasticsearch | Elastic Blog

system · April 25, 2019, 4:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ML Datafeed lookback retrieved no data Elasticsearch elastic-stack-machine-learning	10	2523	April 17, 2018
Event forecast using Kibana ML Kibana elastic-stack-machine-learning	5	667	June 1, 2018
Security Analytics Recipes Elasticsearch	6	1334	August 25, 2017
Forecast data Elasticsearch elastic-stack-machine-learning	3	778	June 7, 2017
Machine learning Forecasting error Kibana elastic-stack-machine-learning	3	464	April 26, 2019

Problems with importing data for forecasting

Related topics