Hi,
I am currently in use of ELK cluster based on version 7.3.1 under Kubernetes environment. I also use Jolokia module inside metricbeat in order to gather many metrics. I wander does anyone can help me regarding Jolokia auto-discovery problem I have in recent versions.
Description: In past I used Kubernetes autodiscovery together with Jolokia module in order to detect changes in Kubernetes cluster (new services, removal of old services... in order to get required jmx metrics from them). This combination worked perfectly until version 7.3.1 Now there are often missing references inside metricbeat when Jolokia looses reference on deleted services. Data are still incoming but lost reference accumulate over time. New services are detected without problems by Jolokia. Problem persist only when service is deleted from Kubernetes cluster.
I am aware that Jolokia module has it own auto-discovery now. Can someone give me a clue how to configure Jolokia auto discovery parameters (name - interface name, grace period, probe timeout) but in case metricbeat runs inside Kubernetes cluster with services that exposes metrics via Jolokia. Thanks in advance
This is a part of metricbeat configuration regarding usage of Kubernetes auto-discovery together with Jolokia module. This worked correctly on previous versions of metricbeat. We also tried Metricbeat 7.5 but problem did not go away.
I also forget to post error in Metricbeat logs when Jolokia module looses reference on service from which it pulls metrics:
INFO module/wrapper.go:252 Error fetching data for metricset jolokia.jmx: error making http request: Post http://<some_ip>:7777/jolokia/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
This happens in case if I delete pods from which Jolokia module puled metrics.
If some other service come to existence in meantime on same IP, Jolokia starts to pull metrics from this new service. This can create great problems because metrics of one service end up presented as metrics of completely different service. Why Jolokia module is not aware of removal of services from Kubernetes cluster is something I tried by failed to explain. I am guessing that new Joloka auto-discovery provider should be used but it lacks proper documentation in case Metricbeat and Jolokia module are used inside Kubernetes cluster. Any help?
Is it possible that problem exists in related form on other Kubernetes setups including Metricbeat as primary source of metrics? Could we at least share problems that belongs to described set of challenges. Maybe this could provide us with solution.
Seem to me, there is some problem with kubernetes autodiscover: when service is deleted from k8s cluster, autodiscover doesn't detect that. @exekias Do you know any known issue around this?
If you are using Kubernetes I recommend you to continue using Kubernetes autodiscover, that will work for all applications you deploy in Kubernetes, no matter if they are Java applications or not. Jolokia autodiscover is intended for deployments where no other supported orchestrator is used.
Regarding the errors when pods are deleted, I think they are caused because Beats have a grace period before stop monitoring a stopped pod, this is specially useful in filebeat, where the logs may have not been fully read when the pod stops, but it is true that its usefulness in metricbeat is more limited. You can disable this grace period by setting cleanup_timeout: 0s in your autodiscover provider configuration.
Hi, thanks for your suggestion. I added cleanup_timeout: 0s inside add_docker_metadata section of metricbeat config file. But after restarts of few Kubernetes pods I have:
INFO module/wrapper.go:252 Error fetching data for metricset jolokia.jmx: error making http request: Post http://<some_ip>:7777/jolokia/: dial tcp <some_ip>:7777: connect: no route to host
add_kubernetes_metadata shouldn't be needed in your case, modules instantiated by autodiscover already add this metadata. add_kubernetes_metadata is needed for events collected by static configurations (not like the ones created by autodiscover provider).
cleanup_timeout should be at the same level than type and templates, from the config you have copied it seems to have one more indentation level, it should be like this:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.