we are using Elastic for Logs and APM in our application. To track bigger JSON payloads sent via REST between Microservices we selectively log those to a capped MongoDB collection in order to debug. I saw the captureBody Config, which allows the same for APM, which would be much more convenient, however the req.body "will be truncated if larger than 2 KiB.", which is understandable for performance reasons.
As I am talking about Json payloads between 0.3 to 10MB, what would be the best approach here? Is this a use case Elasticsearch can be used for?
I believe that "will be truncated if larger than 2 KiB." is out of date. (I'll open an issue to fix that doc.) The actual default truncate for a captured incoming request body is 10k characters, configurable via the longFieldMaxLength (Configuration options | APM Node.js Agent Reference [4.x] | Elastic) config var.
While you could raise that longFieldMaxLength variable to, say, 10M to accommodate your larger request payloads, the config var applies to other fields as well. That may or may not be a concern for your app.
I would worry about possible performance issues in Node.js APM agent with 10 MB captured bodies. The APM agent will serialize that request body as a string in its JSON payload to the APM server.
Maximum size per event (int)
Maximum permitted size of an event accepted by the server to be processed (in Bytes).
Default:307200 Bytes
General recommendations | Elasticsearch Guide [8.11] | Elastic includes a general recommendation for Elasticsearch to avoid large documents. It begins by talking about 100 MB documents there. I don't have personal experience here, so I can't say if some number of 10MB documents might be problematic.
My gut feeling is that there might be real performance concerns here: in the APM agent and by adding (possibly many) large string fields to the Elasticsearch data streams (Data streams | Elasticsearch Guide [8.11] | Elastic) created for APM data. As well, depending if/how you want to search over the captured request body documents, you may want to have control over the Elasticsearch index templates for this data. If so, a better design may be to have that data explicitly going to a data stream or index that is separate from the ones used by the APM system.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.