I would like to run anomaly detection on a subset of an index. It looks like I am supposed to set up a filter list containing the values I wish to select, and then create a custom rule to associate a field with a filter list and an action. I am supposed to edit the JSON directly in the ML job to do this. I have multiple fields and filter lists to apply to my index.
I am using V7.12.1. I am looking for examples of what the JSON should look like, info on what works for this version, as the only examples I see are for future versions, and some advice as to how I can learn to write JSON well enough to apply complex SQL "where" conditions to indexes.
It seems like it would be a common task to need to apply a "where" condition to an index for anomaly detection. Are there any plans to make a GUI interface for this?
This works. But what if I would like to also do an aggregation in order to speed up the processing of the anomaly detection. In your example above, what if I wanted to create an aggregation of the count of records (or the sum of some other numeric field) with the 404 keyword, per day, and feed that into the anomaly detection instead of each of the individual records from the index? I do not have permissions in PRD to create a new index so it would have to be a saved search, editing JSON, using a GUI, or something like that.
The aggregation does work, I wanted to put a few tips here for how I modified the JSON, using V 7.12.1.
First, if you are using the anomaly detection single-metric wizard, the data will be aggregated for you so you do not need to manually modify the job to achieve aggregation of the data feed.
If you switch to the multi-metric wizard, the data will not be aggregated. But, multi-metric is the wizard that allows you to add influencers, this is not available in the single-metric wizard. So you may need to use multi-metric with only one metric if you want to track influencers.
After setting up your multi-metric job if you click on "Convert to advanced job" and select next, you will see a choice for Summary Count Field - select the field that contains your document count. Then select "Edit JSON". You need to edit both the Job Configuration JSON and the DataFeed JSON.
In the Job Configuration JSON you need to add: "by_field_name": "myfieldname" in the detectors section, if you are using a "by" field.
In the DataFeed JSON you need to add the aggregations section of the JSON following the pattern in the article linked above.
The editor will help align your brackets and you can refresh the resulting datafeed to see the aggregation level change.
Using this method you may aggregate your data feed. However, if you desire to use text fields that are ineligible for aggregation, additional changes to your index will be required to add field.keyword to the index for your "by" fields in order to proceed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.