Remove double quotation (") from all fields values

I'm having some issues with normalizing my CEF logs coming over Elastic Agent's CEF integration.
To be precise, they're logs from Silverfort, and there's no way to configure detailed logging options on it to remove the double quotation marks. Here is a part of the raw message:

<134>Nov 25 09:56:59 host CEF:0|Silverfort|Admin Console|1.2.3.4-RC|Authentication|Authentication request|2|rt="1732528619954" suser="user@example.com" sntdom="example.com" shost="host01" src="10.0.0.0" destinationServiceName="dest@example.com" dhost="host02" ...

After being indexed, the fields have quotation marks in the values:

"user": {
   "name": "\"user@example.com\""
}

Any help is appreciated. I tried playing around with painless scripts on the CEF integration under Processors, and with a Logstash ruby filter.
So far, no luck.

The key is to do the processing before it hits the Elasticsearch CEF ingest pipeline as the pipeline does not know how to handle it, and I would like to avoid modifying it since it will be overwritten on updates.

[error in field 'rt': value is not a valid timestamp, error in field 'src': value is not a valid IP address]

Does anyone have a quick and dirty loop through all and remove specific character script?

Cheers,
Luka

The CEF integration allows specifying beats processors which would run well before the ingest pipeline.

The replace processor allows wildcard field selection and text find/replace which may allow you to strip quotes

Hi @strawgate,

Thanks for the prompt response. AFAIK the processors run after the integration does it's work.
In this example, since it is the CEF integration first it would run the decode_cef processor, which would in turn populate all of the fields with quotations in the field values.

The replace processor can only work on one field at a time, so it would not be a valid solution here. Unless the processors run before any of the other beats processors built into an integration.

In my case, using the standard TCP integration with the below processors now fixes such cases.

- rename:
    fields:
      - {from: "message", to: "event.original"}
    
- replace:
    fields:
      - field: "event.original"
        pattern: "\""
        replacement: ""
    ignore_missing: false
    fail_on_error: true

- decode_cef:
    field: event.original  

It would probably be a good idea to open an enhancement request for the CEF integration to do this beforehand, as indexing fields with quotation marks is not really something you should do.

Cheers,
Luka

Ah yes, I misread the replace doc and thought you could field select with a wildcard.

Glad you were able to find a solution for your particular cef source.

Typically, minimal processing is done on the agent side with most processing done in the ingest pipeline. I hadn't looked before but I did look and you're right the CEF decoding happens on the agent and so processors run after cef decoding and before ingest pipelines for this integration.

You can see here integrations/packages/cef/data_stream/log/agent/stream/log.yml.hbs at main · elastic/integrations · GitHub that the CEF integration doesn't do much more on the agent than read the file and call decode_cef anyway

1 Like