Is it possible to redact parts of the captured body in NodeJS APM?

I'm using the NodeJS APM agent for capturing activity on my server. I currently have the APM init configured with captureBody: 'errors' which is great for debugging, but it doesn't seem to redact anything which means that when an error is thrown due to an invalid password, the password gets logged to our ELK instance. That's definitely not ideal. Is there a way to add a middleware or hook into the body capturing functionality to redact parts of the body or not include any body depending on the URL?

Elastic Cloud
Kibana version: 8.8.1
Elasticsearch version: 8.8.1
APM Agent language and version: NodeJS 3.46.0

Hi @pocketcolin,

It is not obvious at all, but sanitization/redaction of fields in a captured incoming HTTP request body will be done only for form data -- i.e. requests with Content-Type: application/x-www-form-urlencoded. This is mentioned at Configuration options | APM Node.js Agent Reference [3.x] | Elastic. I'm not sure if it is discussed elsewhere in the docs. I've opened `captureBody` docs don't mention the conditions for sanitization/redaction · Issue #3426 · elastic/apm-agent-nodejs · GitHub to add mention of this in the captureBody docs.

Take this example. (I'm using the apm.addTransactionFilter() to conveniently dump the transaction data before it is sent on the the APM server. This is also foreshadowing. :slight_smile:

// capturebody.example.js
const apm = require('elastic-apm-node').start({
  // serverUrl: '...',
  // secretToken: '...',
  serviceName: 'capturebody-example',
  apiRequestTime: '2s',
  metricsInterval: '0s',
  captureBody: 'all'
})

apm.addTransactionFilter(trans => {
  console.log('about to send this transaction: ', trans)
  return trans
})

const http = require('http')
const bodyParser = require('body-parser')
const express = require('express')

const app = express()
app.use(bodyParser.json())
app.use(bodyParser.urlencoded({extended: false}))
app.post('/ping', function (req, reply) {
  console.log(`received request: ${req.method} ${req.url}, content-type:${req.headers['content-type']}`)
  reply.send({ping: 'pong'})
})
app.listen({ port: 3000 }, async () => {
  console.log('listening at <http://127.0.0.1:3000/ping>')
})

If we run that and then POST to it with form data:

% curl -v http://127.0.0.1:3000/ping -X POST -d foo=bar -d passwd=secret
...
> POST /ping HTTP/1.1
> Content-Type: application/x-www-form-urlencoded
...

Then the transaction data shows that the "secret" field -- which matches one of the default patterns in Configuration options | APM Node.js Agent Reference [3.x] | Elastic -- is redacted:

% node capturebody.example.js
...
received request: POST /ping, content-type:application/x-www-form-urlencoded
about to send this transaction:  {
  id: 'a26c10c366d34a8e',
  trace_id: '841e9449f6dd84158bd459baaeb6c3bf',
  parent_id: undefined,
  name: 'POST /ping',
  type: 'request',
  duration: 3.551,
  timestamp: 1686610832961007,
  result: 'HTTP 2xx',
  sampled: true,
  context: {
    user: {},
    tags: {},
    custom: {},
    service: {},
    cloud: {},
    message: {},
    request: {
      http_version: '1.1',
      method: 'POST',
      url: [Object],
      headers: [Object],
      socket: [Object],
      body: '{"foo":"bar","passwd":"[REDACTED]"}'
    },
    response: { status_code: 200, headers: [Object] }
  },
  span_count: { started: 0 },
  outcome: 'success',
  faas: undefined,
  sample_rate: 1
}

However, if we POST with a JSON content-type:

% curl http://127.0.0.1:3000/ping -X POST -H content-type:application/json -d '{"foo":"bar","passwd":"secret"}'

Then there is no redaction:

received request: POST /ping, content-type:application/json
about to send this transaction:  {
  id: 'dabeae3b13849dd7',
  trace_id: '764730e22d8a60f4428aeba274681f59',
  parent_id: undefined,
  name: 'POST /ping',
  type: 'request',
  duration: 14.57,
  timestamp: 1686610819372043,
  result: 'HTTP 2xx',
  sampled: true,
  context: {
    user: {},
    tags: {},
    custom: {},
    service: {},
    cloud: {},
    message: {},
    request: {
      http_version: '1.1',
      method: 'POST',
      url: [Object],
      headers: [Object],
      socket: [Object],
      body: '{"foo":"bar","passwd":"secret"}'
    },
    response: { status_code: 200, headers: [Object] }
  },
  span_count: { started: 0 },
  outcome: 'success',
  faas: undefined,
  sample_rate: 1
}

As I hinted above, you could use apm.addTransactionFilter(fn) to filter as you require for your application. Something like this:

apm.addTransactionFilter(trans => {
  if (trans?.context?.request?.body) {
    try {
      const body = JSON.parse(trans.context.request.body)
      if ('passwd' in body) {
        body.passwd = '[REDACTED]'
      }
      trans.context.request.body = JSON.stringify(body)
    } catch (_err) {
      // pass
    }
  }
  // console.log('about to send this transaction: ', trans)
  return trans
})
2 Likes

Thanks, Trem! That is exactly what I was looking for. And that's very interesting to see that request.body is a stringified JSON object because that transaction param is unfortunately just typed as { [propName: string]: any }.

Yah, we don't have a strong TypeScript typing of the payload.

FWIW, if it helps, we have JSON Schema definitions for those payloads once they are serialized to JSON. The entry for the request body for transactions is here: apm-agent-nodejs/test/integration/api-schema/apm-server-schema/transaction.json at main · elastic/apm-agent-nodejs · GitHub

@trentm question - does captureBody not capture 4xx level error bodies? I'm seeing redacted bodies on my error transactions for some 400 errors.

@pocketcolin I think this'll be confusion on whether this is a captured request body on a "transaction" APM event or on an "error" APM event. You had configured the APM agent with captureBody: 'errors' so you should only be seeing a captured request body for an "error" APM event.

On "transaction" APM events, you should expect to see transaction.context.request.body = '[REDACTED]'. (I find this is somewhat confusing. It isn't so much that the body was captured, and then redacted because it was sensitive data. Rather the body was just not captured. I think having that field just not set in the captured data would be clearer for this case.)

So, for your case, I think the question is: Why don't you see "error" APM events for incoming HTTP requests that result in a 4xx response statusCode. Ultimately this will depend on the instrumentation for the web framework you are using, but in general a 4xx statusCode is not considered an error from the server-side, because it is a client error. At least that is the decision in our APM agent specs. From https://github.com/elastic/apm/blob/main/specs/agents/tracing-transactions.md#transaction-outcome

  • "failure": Indicates that this transaction describes a failed result.
    Note that client errors (such as HTTP 4xx) don't fall into this category as they are not an error from the perspective of the server.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.