Big JSON Payloads in Elastic APM

Hi there,

we are using Elastic for Logs and APM in our application. To track bigger JSON payloads sent via REST between Microservices we selectively log those to a capped MongoDB collection in order to debug. I saw the captureBody Config, which allows the same for APM, which would be much more convenient, however the req.body "will be truncated if larger than 2 KiB.", which is understandable for performance reasons.

As I am talking about Json payloads between 0.3 to 10MB, what would be the best approach here? Is this a use case Elasticsearch can be used for?

Thanks!

Hi @rStorms,

I believe that "will be truncated if larger than 2 KiB." is out of date. (I'll open an issue to fix that doc.) The actual default truncate for a captured incoming request body is 10k characters, configurable via the longFieldMaxLength (Configuration options | APM Node.js Agent Reference [4.x] | Elastic) config var.

While you could raise that longFieldMaxLength variable to, say, 10M to accommodate your larger request payloads, the config var applies to other fields as well. That may or may not be a concern for your app.

I would worry about possible performance issues in Node.js APM agent with 10 MB captured bodies. The APM agent will serialize that request body as a string in its JSON payload to the APM server.

You said you "selectively log" some request payloads. You could possibly reproduce this selectivity by using the APM agent's apm.addTransactionFilter(...) API (Agent API | APM Node.js Agent Reference [4.x] | Elastic).

Then the next issue in the pipeline will be the maximum event size that the APM server/integration will accept, as documented at APM input settings | APM User Guide [8.4] | Elastic

Maximum size per event (int)
Maximum permitted size of an event accepted by the server to be processed (in Bytes).

Default: 307200 Bytes

General recommendations | Elasticsearch Guide [8.11] | Elastic includes a general recommendation for Elasticsearch to avoid large documents. It begins by talking about 100 MB documents there. I don't have personal experience here, so I can't say if some number of 10MB documents might be problematic.

My gut feeling is that there might be real performance concerns here: in the APM agent and by adding (possibly many) large string fields to the Elasticsearch data streams (Data streams | Elasticsearch Guide [8.11] | Elastic) created for APM data. As well, depending if/how you want to search over the captured request body documents, you may want to have control over the Elasticsearch index templates for this data. If so, a better design may be to have that data explicitly going to a data stream or index that is separate from the ones used by the APM system.

Hi @trentm,

wow, this is great advice and confirms my feelings. I will think about a separate data stream and maybe link the data another way.

Thanks!

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.