How can one define additional context data (a set of keys, which go in every document), such that all data sent via this dedicated apm-server is extended by the dataset.
Background: We are using a system with a set of embedded computers and a huge number of services within.
Only the main computer has a network connection to the elk-stack (currently a logstash instance in the cloud)
But of course, this setup is replicated mutliple times.
For test-purposes in release operations, we we are running marked/named testcases on those systems, and want to compare the testcases operated on different systems with each other.
So - for the metrics-collection, we already succeeded in collecting the data in the different subsystems with an own service, which also provides some gRPC/ restAPI endpoint for receiving the test-case ID data to mark the datastream (each and every document sent to the database) with the context.
So - I like to extend the datacollection based on to the apm-service to collect services based on .Net or JavaScript/Web stuff...
Question: How can the apm-server be influenced to accept a set of key-value pairs to append on each document sent to the elk-stack?
Note: I've looked already into logstash, but that one has a far TOOO BIG footprint for installation to be operated on such embedded system.
I am interested in a solution with small footprint.
To make sure I understand correctly, when you say "additional user-defined marker data", is it per-testcase data, per-apm-server data, or per-agent data?
For per-apm-server, you may use data_streams.namespace such that events are routed to data streams with different namespaces. Also, is it possible to utilize observer.hostname in the document?
For per-agent, you may consider global labels (e.g. java agent). These are labels that are static and cannot be modified at runtime. They do not work for RUM agents, as these global labels will be stripped off from aggregated metrics for RUM agent, see docs where it says " labels: Key-value object containing string labels set globally by the APM agents. This dimension is not present for RUM agents.".
Since, the test-case ID is sent via gRPC from a test-server (in the cloud).
I'd like to mark all data, which belongs to this very testcase to belong together. The update-rate is s.th. around ~3 minutes
(maybe a very short testcase runs only 1min, but on average, it will be s.th. around every 3-5 minutes)
So - I'd like to have s.th. like an endpoint for the apm-server, where one could set addtional data/enriched data/ key-value pairs, where the apm-server extends/enriches all data, it receives and pushes to the database.
But realizing as mentioned above, "enriching via lookups" on logstash would lead to a huge performance bottleneck, since every telegram beeing processed by logstash would lead to a database-inquiry...
And ... hm... the mentioned "labels" within the agent go in the right direction, but are not quite the best idea, since the code under test shall not be modified (or with minimal changes) - and we will have at least two agents (.Net and node js)
As far as I'm aware, there's no endpoint in apm-server that configures enrichment within apm-server. There are some alternatives:
Use ES ingest pipelines instead of logstash. That way, you don't have to deploy logstash. To avoid enriching via lookups, have you considered a script that simply periodically calls ES API to update the ingest pipeline to enrich APM data based on observer.hostname?
... but I am a bit confused (more clear - I am afraid, that it won't work), how you would propose to apply/intercept the datastream, since in some of the elastic-agent web-pages, the caveats are described, that the baggage API is not properly supported...
So - the idea seems to be brilliant, but it might be, that the interesting data gets silently dropped on the way. (Or I do not see properly the intended structure...)
If I interpret you right, you would use an OTel Collector, which sends to some OTel magic server (which would server for the baggage), thereafter to the APM Server, which then accesses the ELK Stack.
Sorry for the confusion. It seems that you are indeed using Elastic Node.js agents, and the OpenTelemetry bridge will allow you to use OTel APIs within your nodejs app but send Elastic intake v2 protocol to apm server directly. The OTel data_stream.* attributes option is not going to work in your setup.
Have you thought about the ES ingest pipeline option? e.g. APM server A has hostname "apm_server_a", and you setup custom ingest pipelines to use reroute processor to reroute it to a different namespace based on the observer.hostname.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.