Problems with importing data for forecasting

Hi All,

Super new to ELK. Have set up two VM's on cloud with elasticsearch and Kibana running respectively.

I am experimenting with the dashboard and imported an excel sheet which has mysql logs.

Was trying to perform Predictive analysis however i encounter the error: Anomaly detection can only be run over indices which are time based.

Could someone please help me understand what i am lacking or what kind of data is needed for Predictive analysis or if there are any links i have missed?

Any help deeply appreciated.

Thanks in advance!!

Hello,

Yes, your log lines need to have (at the bare minimum) a timestamp and a log message. The timestamp must be properly parsed and converted and stored into elasticsearch. In general, the CSV import facility inside of ML can handle this for most timestamp formats. You can see from the docsthat the following date formats are supported:

dd/MMM/YYYY:HH:mm:ss Z
EEE MMM dd HH:mm zzz YYYY
EEE MMM dd HH:mm:ss YYYY
EEE MMM dd HH:mm:ss zzz YYYY
EEE MMM dd YYYY HH:mm zzz
EEE MMM dd YYYY HH:mm:ss zzz
EEE, dd MMM YYYY HH:mm Z
EEE, dd MMM YYYY HH:mm ZZ
EEE, dd MMM YYYY HH:mm:ss Z
EEE, dd MMM YYYY HH:mm:ss ZZ
ISO8601
MMM d HH:mm:ss
MMM d HH:mm:ss,SSS
MMM d YYYY HH:mm:ss
MMM dd HH:mm:ss
MMM dd HH:mm:ss,SSS
MMM dd YYYY HH:mm:ss
MMM dd, YYYY h:mm:ss a
TAI64N
UNIX
UNIX_MS
YYYY-MM-dd HH:mm:ss
YYYY-MM-dd HH:mm:ss,SSS
YYYY-MM-dd HH:mm:ss,SSS Z
YYYY-MM-dd HH:mm:ss,SSSZ
YYYY-MM-dd HH:mm:ss,SSSZZ
YYYY-MM-dd HH:mm:ssZ
YYYY-MM-dd HH:mm:ssZZ
YYYYMMddHHmmss

Maybe you can post a few example lines from your CSV so that we can see what's up...

1 Like

Thank you Richcollier!

I have imported a .csv extension Excel sheet and want to attempt Forecasting/Predictive Analysis on it.

Here is the sample data:

Time SQL Server Memory Usage (MB) SQL Server Locked Pages Allocation (MB) SQL Server Large Pages Allocation (MB) page_fault_count memory_utilization_percentage available_commit_limit_kb process_physical_memory_low process_virtual_memory_low
Time SQL Server Memory Usage (MB) SQL Server Locked Pages Allocation (MB) SQL Server Large Pages Allocation (MB) page_fault_count memory_utilization_percentage available_commit_limit_kb process_physical_memory_low process_virtual_memory_low
--- --- --- --- --- --- --- --- ---
2019-03-15:16:27:00 PM 24667 0 0 50298367 100 81667232 0 0
2019-03-15:16:27:00 PM 24667 0 0 50298367 100 81667232 0 0
2019-03-15:16:27:00 PM 24667 0 0 50298367 100 81667232 0 0
2019-03-15:16:27:00 PM 24667 0 0 50298367 100 81667232 0 0

Please let me know if this minimal amount of data works? also what are the requisites for creating an Alert? i.e should it only be real time? I ask this because it said it requires a time field for creating one.

Thanks again.. awaiting your quick response.

The problem seems to be with your timestamp format. In the above, the Time field looks like:

2019-03-15:16:27:00 PM

If instead it looked like:

2019-03-15 16:27:00

(no extra : between the 15 and 16 and no PM on the end) then the data would be ingested properly and the Time field would automatically recognized:

Thanks so much!

I did manage to get the time field in and create a job as well; However, when i run a Forecast on it i get this error: [status_exception] Cannot run forecast: Forecast cannot be executed as job requires data to have been processed and modeled

The data i used is below:

Time SQL Server Memory Usage (MB) SQL Server Locked Pages Allocation (MB) SQL Server Large Pages Allocation (MB) page_fault_count memory_utilization_percentage available_commit_limit_kb process_physical_memory_low process_virtual_memory_low
2019-02-15T05:01:55.000+0000 24667 0 0 50298367 100 81667232 0 0
2019-02-15T05:01:45.000+0000 24668 0 0 50298367 101 81667232 0 0
2019-02-15T00:02:41.000+0000 24663 0 0 50298367 103 81667232 0 0
2019-02-15T00:02:31.000+0000 24669 0 0 50298367 104 81667232 0 0

Job messages are:

Could you please help?

Thanks again!

Reading the error messages, it seems as if you're having problems with two, necessary indices that ML relies on to operate - .ml-state and .ml-anomalies-shared

What do you see if you go to Index management for these indices? For example:

Here is what you asked for:

Ok, well it seems as if those indices are actually OK, despite what the error log indicated.

Can you also post the output of this API call against your job config:

GET _xpack/ml/anomaly_detectors/highsummemory/_stats

Could you please guide me on how to do that? Sorry still new..

In Kibana, go to "Dev Tools" on the left menu, and then make sure "Console" is selected at the top. Paste in the text from the previous message, then click the arrow:

Paste the return contents from the right hand side back here

Here it is:

{
  "count" : 1,
  "jobs" : [
    {
      "job_id" : "highsummemory",
      "data_counts" : {
        "job_id" : "highsummemory",
        "processed_record_count" : 0,
        "processed_field_count" : 0,
        "input_bytes" : 0,
        "input_field_count" : 0,
        "invalid_date_count" : 0,
        "missing_field_count" : 0,
        "out_of_order_timestamp_count" : 0,
        "empty_bucket_count" : 0,
        "sparse_bucket_count" : 0,
        "bucket_count" : 0,
        "input_record_count" : 0
      },
      "model_size_stats" : {
        "job_id" : "highsummemory",
        "result_type" : "model_size_stats",
        "model_bytes" : 0,
        "total_by_field_count" : 0,
        "total_over_field_count" : 0,
        "total_partition_field_count" : 0,
        "bucket_allocation_failures_count" : 0,
        "memory_status" : "ok",
        "log_time" : 1553618543184
      },
      "forecasts_stats" : {
        "total" : 4,
        "forecasted_jobs" : 1,
        "memory_bytes" : {
          "total" : 0.0,
          "min" : 0.0,
          "avg" : 0.0,
          "max" : 0.0
        },
        "records" : {
          "total" : 0.0,
          "min" : 0.0,
          "avg" : 0.0,
          "max" : 0.0
        },
        "processing_time_ms" : {
          "total" : 0.0,
          "min" : 0.0,
          "avg" : 0.0,
          "max" : 0.0
        },
        "status" : {
          "failed" : 4
        }
      },
      "state" : "opened",
      "node" : {
        "id" : "VW-G7JA1RqCDn_QOZJpF7w",
        "name" : "VW-G7JA",
        "ephemeral_id" : "aKFtymwoRieUFhLHbjWMaA",
        "transport_address" : "172.31.28.20:9300",
        "attributes" : {
          "ml.machine_memory" : "1031708672",
          "xpack.installed" : "true",
          "ml.max_open_jobs" : "20",
          "ml.enabled" : "true"
        }
      },
      "assignment_explanation" : "",
      "open_time" : "21s"
    }
  ]
}

I'm a little confused - you may need to describe the steps you've taken so far. It looks to me as if you've attempted to run a forecast for a job without having that job first learn on historical data.

Perhaps if you can describe the steps you've taken so far we can figure out what you are having issues with...

I might have missed on something... Could you elaborate on your sentence: "attempted to run a forecast for a job without having that job first learn on historical data"

I am trying to perform the Forecast on just the 4 lines of logs are you saying that is the problem?

I did read that a minimum of 3 weeks of data is required to forecast..

Please shed some more light on the same.

Thank you for your patience on this. Deeply Appreciated!!

Forecasting is ONLY possible if the ML job has already been run on a sufficient amount of historical data. This is required because forecasting leverages the models from the anomaly detection execution. More information is found here: https://www.elastic.co/blog/elasticsearch-machine-learning-on-demand-forecasting

Also, you cannot do anything (anomaly detection, or forecasting) adequately with only 4 samples. You need a continuous time-series of many samples (at least a few hundred and at least a few hours at the BARE minimum). A good rule of thumb is about 3 weeks or more worth of data.

Also, you should only attempt to forecast out an amount of time equal to or less than the amount of data that has been learned. So, it would not be advisable to attempt to forecast 8 weeks into the future if only 1 week has ever been seen.

Thank you so much for your prompt and immediate responses!

Also, just out of curiosity would it be possible to input an hour of data and predict something for the next 15 or 5 mins?

Is it possible to create alerts with this few lines of code? or does that have to fullfill certain requirements?

It would depend on the number of samples in that hour (it would need to be hundreds). So, in practical terms, you probably don't have that, thus it would not be enough. I'm not sure why there's a hesitation on getting more data (like week's worth). Is that impractical for you?

Is it possible to create alerts with this few lines of code? or does that have to fullfill certain requirements?

Not sure what you are referring to by "this few lines of code". But, in general, yes you can create alerts from anomaly detection and/or forecasting results from ML. The information is stored back into an ES index that you can alert on via Watcher (X-Pack Alerting): Alerting on Machine Learning Jobs in Elasticsearch | Elastic Blog

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.