i'm using the prometheusreceiver in a Deployment to scrape our pods every 60s, the metrics are then sent via OTLP to another instance of the collector running as a DaemonSet, which then sends them onto elastic cloud. What I'm finding is that the metrics are not coming through at regular 60s intervals,
For example if there are 3 pods, each minute between 0 and 3 records of a metric are indexed. which suggests that they are being dropped at some point.
In the collector that is performing the scrapes I can see that they being performed regularly, so I'm starting to think that there is an issue in the gateway collector or the elastic cloud end.
Is there a way to identify that they gateway is performing as expected? So that I can eliminate that part of the chain.
Ah looking at the internal metrics on the gateway collector I can see that the otelcol_exporter_send_failed_metric_points counter is increasing all the time.
If you put in the component template it will not take affect until the data stream rolls over which could take some time
You can try gong Kibana - Dev Tools
POST metrics-otel-default/_rollover
When you do that it will create the new backing index which use the component template...that said because TSDS the new metrics will not to start flow into that new backing index. For ~30 mins (long explanation left out)
Plus you did share your configuration from either collector ..
And I see you are using version 112.. I think the latest is 122
Yes the index has rolled over already, the ILM policy that includes this data stream is set to roll over daily.
In my support case I shared all 3 relevant configs used by the in-cluster operator.
So I'm trying to follow the documented and supported versions... The generated values file that is passed into the helm chart sets the image to docker.elastic.co/beats/elastic-agent:8.16.5. Which matches the Elastic cluster version.
Does the otel/opentelemetry-collector-contrib:0.122.1 have the same distribution config as the elastic-agent?
This is a public forum we do not have access to your support case, but good sounds like you are working with support ...
Might be worth a try...
And of course as I am sure you know Elastic Agent in OTEL mode is still technical preview...
This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features
We absolutely appreciate you using / trying and reporting issues!
Just to follow up as I addressed this with the developer of the exporter
Using otel-colector-contrib:0.122.0 or otel-colector-contrib:0.124.0 I'm now seeing no errors about duplicate metrics. Our hosted Elastic cluster is running 8.16.6
Ultimately we needed to add a component template using
Then roll over the data stream to use a new index (docs don't get routed into the new index for 30m) with
POST metrics-prometheusreceiver.otel-default/_rollover
Although the _metric_names_hash is indexed dynamically, i think it'll be missing the time_series_dimension: true on that field.
Also, i think you've helped us discover a bug in releasing this _metric_names_hash workaround.
Yes however our hosted cluster is on 8.16.6 so shouldn't have needed the custom component template based on the wording of conditions about when you would need to add it or not.
I think this is what foxed Carson in the first instance because it looked like it should have been fine.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.