I have an 8.8.x Elastic deployment with ~1,000 instances of Metricbeat monitoring systems supporting few hundred products across our enterprise. Each instance of Metricbeat is auto-provisioned (not using Fleet) and is configured with a custom field called "product".
I have a requirement to generate a notification when a new product is detected in our metricbeat indices or when an existing product is no longer being populated with data.
My initial thought was I'll have to write external code to periodically (daily??) query my metricbeat indices to create a baseline list of products then periodically (hourly??) run new queries to compare with the baseline.
Is this a reasonable approach? How would others tackle this? Could I somehow use Transforms to keep all the processing internal to Elastic?
Hmmm, let us think about this ... I have commonly used the latest transform to find when hosts, apps, and services go away, and it works great for that... I would need to think about how to use it for new... you could probably pretty easily detect the count increase. Identifying what is new might be more difficult.
This for sure you can do a latest transform group by hostname and timestamp for latest entry and then just alert when that last entry is older than say 24 hours (or whatever time you choose, I have done this many times it works great)
For this, I think you will need to use a pivot transform grouped by host.name and do a cardinality or value count on a field, and then alert on a difference or something.
It is possible that you could use the pivot transform for both...
This for sure you can do a latest transform group by hostname and timestamp for latest entry and then just alert when that last entry is older than say 24 hours (or whatever time you choose, I have done this many times it works great)
For this I think you will need to use a pivot transform grouped by host.name and do a cardinality or value count on a field and then alert on a difference or something.
It is possible that you could use the pivot transform for both...
If I get a chance, I will try to look more.
Warning there is a Bug in the Transform UI which makes it very sluggish, it does work but the drop Down Lists can very frustrating...
If you are good with the API you can create the Transform via API.
Yes, your "latest transform" suggestion sounds perfect for when product data is no longer coming in.
As for new data, I created two pivot transforms with a group by of "product". One with a 1 min frequency and the other with an hourly frequency. My thought was to try and develop a Watcher that compares the two indices every mins or so and alert on the differences. In theory the hourly index would always lag behind the 1 min index. If this works then this should work for both new data and missing data.
Unfortunately, my Watcher skills are a bit rusty so my challenge now is how to compare "products" in the 1-min index to "products" in the hourly index.
I may have a misunderstanding of how the pivot transforms work. Can you confirm whether or not pivot transforms only show current data? If old data stays in the index then my idea isn't going to work.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.