we are currently in the process of migrating from Splunk to Elasticsearch.
We have most of the data now in our ELK Cluster and are in the process of
recreating the Dashboards and visualizations necessary.
In Splunk we currently have so called "LookUp" tables which are static lists (which may get regenerated once per week)
containing processes or queues that are should be present through out our systems.
Based on this list, the teams can query the data present in the system and quickly see if we have the correct amount of processes
on a specific host (or which is missing), the correct number of queues and things like that.
Also it allows us to create flexible alerts.
I searched through the forums and the documentation and found a lot of questions regarding "SplunkLookup tables" ending up solved with enrichment policies and similar approaches.
But in this case this seems not to be quite fitting.
Because these lists can change and are quite large (~500 hosts, ~400 queues and so on) it is not that easy to simply create an alert or dashboard by hand for each.
Lets take as a example the following:
we have a Lookup table / or lookup index containing mandatory hosts that sould be present in our system: "host1, host2, host3"
based on that we would like to achieve something like that as a secondary index or temp index:
This would be currently required to be visualized on a dashboard.
On top of that we would need to send an alert if the count would be < 0 or just no data is present for a certain timeframe.
We could of course create such an alerting rule by hand currently, but as said we have quite a lot of this lookups currently which also are constantly changing.
I hope someone had an simmilar issue and might share some ideas how to migrate this to elasticsearch!
Maybe we simply missing a totaly obvious solution to this.
Regarding the "ENRICH" function: From the documentation I read it states: " The ES|QL ENRICH command only supports enrich policies of type match . Furthermore, ENRICH only supports enriching on a column of type keyword ."
So therefor to me it seems I can not simply run a "COUNT" like query and add the result, right?
Thats maybe exactly the point where we might missing something. How would the approach be for the ENRICH here?
Assuming we get this index updated on a regular daily basis, containing a high and volatile number of hosts.
Based on that we would like to join / enrich per hosts a field that contains the count of an ES|SQL or some other query like:
from metrics* | where host.name === *our index hosts*
If that is not possible we would be grateful for any other way we could determine on a live basis that we retreive data from all hosts listed in this index. (without hard wiring the host.name into alerts by hand)
Besides the host lookup, we use this pattern currently also for other datasources. So we are looking for some kind of general approach here.
Here it would be fundamental for us to see also the hosts with Zero metrics.
Hence we trying to implement based on a lookup table / enrich construct.
We allready tried the enrich policys but we are currently only able to match static values.
We are not able to enrich dynamic values like the count metrics based on host.name
So the situation is as follows:
We have a list of hosts in a lookup table (including fqdn)
We have converted that lookup table into an enrich index via an enrich policy
Now we want a count of ALL metrics by host in this lookup table (now enrich index) INCLUDING hosts that haven't sent any data. For those hosts the count would be 0.
Based on the article we also don?t see the possibility to include hosts that haven't sent any data in a specific timeframe, we would need to see those as well.
We do not want to simply enrich existing documents in a query with some extra fields (fqdn in your example, we have that already), we're basically trying to do an SQL OUTER JOIN (metric index on lookup index).
Checking each host individually is not a possibility in our case as the count of hosts we're monitoring is in the three digits.
No the the OUTER join as you describe it is not available as far as I know.. perhaps in the future...
(I'll noodle on it a bit more... But pretty sure there is no natural way to do that today)
But if you are simply trying to figure out which hosts have not reported in in the last given time frame?
And you are using metrics to do that? Are you collecting logs from the host?
If so there is another way... Using a latest transform.
I have used it with many customers to support this / similar use case..
Example
You setup latest transform and then it it easy to find any host that has not reported in in say last 1.. 24 hour etc.. BTW I had a customer use this with a host inventory of 100K hosts
Latest Transforms can be used for a lot of things: host last reported in, user last login ...,ots of use cases.
thank you for the recommendation with the "latest transform".
The last days we did some experiments with it and now are able to get our host lookup table working.
We are combining the "latest transform" with the enrichment.
So we use the latest transform on the standard "metrics-*" dataview and transform it to an new index based on "host.hostname" and "@timestamp".
On our "lookup-hosts" index we perform an enrich based on the transformed index. Now we have the desired result with the latested timestamp on each host we received metrics from and an "null" timestamp on the host we never received data from.
Thank you for you tips and help.
Just wondering if your "I'll noodle on it a bit more" might brought up some other ways we might use for our other use cases and might be a bit more elegant?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.