Elastic APM sanitization on message body

We are using Elastic APM from Elasticsearch 7.8 with java agent. We would like to capture HTTP message body. However, it does not work as expected.

First, I'd expect that the sanitization can ensure a field would not be indexed if it contains the specified pattern in the property name. However, I can see that the sensitive value is presented in the http.request.body.original. But some other fields such as http.request.headers.Authorization are truly redacted. This shows that the sanitization configured properly, but for some reason, it does not work on the message body. Perhaps it expects the field name and due to the following issue, it hasn't been extracted.

Second, I can see that the entire message in the body is shown as a single value (the entire JSON object is stored as is and hasn't been indexed). This means I have got the following json value:

{
  "email": "test@abc.com",
  "password": "xxxxx"
}

Whereas, I expected to have a separate value for email and password (redacted). Am I missing something here?

The agent captures request body as string, it doesn't attempt to parse it according to any specific schema. Therefore, it would not index JSON-fields and similarly can't sanitize for specific fields.

You should be able to achieve what you want by using ingest node pipelines, specifically using the JSON processor.

I hope this helps.

1 Like

Do I need to apply sanitization also as an additional processor? Does sanitization happen at the agent level or the server does the magic?

Moreover, we are using Elastic Cloud and I couldn't find a way to add a new pipeline to APM server. Is this only achievable in the self-hosted version?

The sanitation you see in agent configuration is done by agents. You need to handle that through processors, however, not necessarily you need more than one. From looking at the JSON processor, it seems you should be able to pick only the fields you want in the first place.

Thanks. I am not sure how I can use JSON processor with a condition to skip a field if it matches a certain pattern. The issue with JSON processor is I can only have a condition on the input, but what I am looking for is a condition on the output. For example in the shared example, I would like to only skip password. I still would like to keep the email address.

This can be accomplished Elastic Cloud by editing the apm processing pipeline that is always deployed there. This can be done via the UI at <kibana_url>/app/kibana#/management/elasticsearch/ingest_pipelines?pipeline=apm or via the standard kibana APIs.

Here is a complete example that I believe can accomplish what you have in mind:

POST /_ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "redact http.request.body.original.password",
    "processors": [
      {
        "json": {
          "field": "http.request.body.original",
          "target_field": "http.request.body.original_json",
          "ignore_failure": true
        }
      },
      {
        "remove": {
          "field": "http.request.body.original",
          "if": "ctx?.http?.request?.body?.original_json != null",
          "ignore_failure": true
        }
      },
      {
        "set": {
          "field": "http.request.body.original_json.password",
          "value": "[redacted]",
          "if": "ctx?.http?.request?.body?.original_json != null"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "http": {
          "request": {
            "body": {
              "original": """{"email": "test@abc.com", "password": "itsasecret"}"""
            }
          }
        }
      }
    },
    {
      "_source": {
        "nobody": true
      }
    },
    {
      "_source": {
        "http": {
          "request": {
            "body": {
              "original": """["invalid json" """
            }
          }
        }
      }
    }
  ]
}
4 Likes

Thanks. So what you are suggesting is to update the 'apm' pipeline and register the new processor. Do I need to restart the APM server or it will be loaded dynamically?

It is not necessary to restart anything. Once you have updated the pipeline in Elasticsearch, any new data that APM Server sends to Elasticsearch will be run through the updated pipeline.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.