Remove double quotation (") from all fields values

lduvnjak · November 25, 2024, 12:58pm

I'm having some issues with normalizing my CEF logs coming over Elastic Agent's CEF integration.
To be precise, they're logs from Silverfort, and there's no way to configure detailed logging options on it to remove the double quotation marks. Here is a part of the raw message:

After being indexed, the fields have quotation marks in the values:

"user": {
   "name": "\"user@example.com\""
}

Any help is appreciated. I tried playing around with painless scripts on the CEF integration under Processors, and with a Logstash ruby filter.
So far, no luck.

The key is to do the processing before it hits the Elasticsearch CEF ingest pipeline as the pipeline does not know how to handle it, and I would like to avoid modifying it since it will be overwritten on updates.

[error in field 'rt': value is not a valid timestamp, error in field 'src': value is not a valid IP address]

Does anyone have a quick and dirty loop through all and remove specific character script?

Cheers,
Luka

strawgate · November 27, 2024, 1:23am

The CEF integration allows specifying beats processors which would run well before the ingest pipeline.

The replace processor allows wildcard field selection and text find/replace which may allow you to strip quotes

lduvnjak · November 27, 2024, 9:49am

Hi @strawgate,

Thanks for the prompt response. AFAIK the processors run after the integration does it's work.
In this example, since it is the CEF integration first it would run the decode_cef processor, which would in turn populate all of the fields with quotations in the field values.

The replace processor can only work on one field at a time, so it would not be a valid solution here. Unless the processors run before any of the other beats processors built into an integration.

In my case, using the standard TCP integration with the below processors now fixes such cases.

- rename:
    fields:
      - {from: "message", to: "event.original"}
    
- replace:
    fields:
      - field: "event.original"
        pattern: "\""
        replacement: ""
    ignore_missing: false
    fail_on_error: true

- decode_cef:
    field: event.original

It would probably be a good idea to open an enhancement request for the CEF integration to do this beforehand, as indexing fields with quotation marks is not really something you should do.

Cheers,
Luka

strawgate · November 27, 2024, 2:28pm

Ah yes, I misread the replace doc and thought you could field select with a wildcard.

Glad you were able to find a solution for your particular cef source.

Typically, minimal processing is done on the agent side with most processing done in the ingest pipeline. I hadn't looked before but I did look and you're right the CEF decoding happens on the agent and so processors run after cef decoding and before ingest pipelines for this integration.

You can see here integrations/packages/cef/data_stream/log/agent/stream/log.yml.hbs at main · elastic/integrations · GitHub that the CEF integration doesn't do much more on the agent than read the file and call decode_cef anyway

Topic		Replies	Views
Elasticsearch indexing wrong value for double fields Elasticsearch	4	686	August 25, 2017
How to replace comma with double quotes in Elasticsearch ingest pipeline? Elasticsearch	1	375	December 23, 2022
Logstash - elasticsarch and double quotes Logstash	12	2447	May 13, 2021
ES Ingest Elasticsearch	4	403	April 1, 2019
Http filter plugin parse error - Unexpected character ('\\\\' (code 92)): was expecting double-quote to start field name Logstash	2	2639	October 24, 2019

Remove double quotation (") from all fields values

Related topics