Can we use the APM UI without injestion via opentracing or APM Server and use logs instead?

I know that the expected way to ship data to APM is using the agent or the HTTP api and that the apm server transforms the requests into an Elastic search document the format of which is described here https://www.elastic.co/guide/en/apm/server/6.1/generated-docs.html

We already ship JSON logs to Elastic search and would like to use the APM UIs to visualize HTTP requests/responses and distributed transactions etc. however we use a mix of erlang and python and would prefer not to use the (python) agent or write our own HTTP emitters. Would it be possible to emit JSON logs in the transformed format since these are already being shipped to our ES cluster ? This would greatly ease the integration process and alleviate concerns around the potential impact of agents (ours or community ones) in production.

My recommendation here is to send data to an APM Server and let it be the one to forward it to Elasticsearch, to take advantage of ingest capabilities like some form of buffering, pipeline management, security (Elastic only has to talk to the APM server, no need to open it wide to an unnecessary network). You can opt to not use an agent and just send the minimum payload you can but that means

  1. You take care of trace and transaction ID generation and propagation
  2. You implement some form of payload management in your erlang and python
  3. You have to routinely look out for bug and security fixes and implement those yourself

What are your concerns regarding the Elastic APM Python agent? How would logging traces to a file alleviate them?

We use two custom frameworks based on tornado - one using python2.7 and one using python3.6 so we would likely have to use either https://github.com/laerteallan/apm-agent-python-tornado or refer to the PR to add tornado to the official python agent library. However, bigger than the integration effort is the risk that tracing would impact production requests for some unknown reason - we have modified tornado itself to support proxies with AQM so it is not standard and we'd need to be careful when adding a background http request to the ioloop that it was bounded in terms of performance and did not affect our custom backpressure / AQM code. In general the company is risk adverse and we would need time to loadtest the impact on these apps to 'prove' out the cost of the agent could be controlled.

I see there is a failsafe to disable instrumentation, which is great. I think we'd also appreciate limits on data rates per instance of the agent ( in addition to the sampling % )

The logging approach I mentioned has it's own issues but it can be evaluated on paper, before we do an integration, since we can calculate the logging load required.

In the end the APM data is, as we say, "just another index". In theory, you could write a Filebeat or Logstash module to ingest your JSON logs and transform them into the APM format to Elasticsearch.

If you did use apm-agent-python-tornado or similar: the Python agent has a configurable transport, which you could implement in such a way that it logs to disk rather than sends over HTTP. You would need to take care of buffering to ensure that logging doesn't then introduce bottlenecks to your application, which can quite easily happen in a hot code path. Also, I don't think the transport interface is part of the agent's stable API, so it could make upgrades a little more troublesome.

I don't know of anyone doing this, so I can't provide any references unfortunately.

I see there is a failsafe to disable instrumentation, which is great. I think we'd also appreciate limits on data rates per instance of the agent ( in addition to the sampling % )

There are a couple of things that we've been investigating here which I think will help out here:

  1. agent-side aggregation for non-sampled transactions
  2. rate-limit configuration for sampling, to enable you to specify sampling in terms of transactions per second

Together these would mean the amount of data sent would be proportional to your sampling rate, rather than the transaction rate. Is this along the lines of what you had in mind?

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.