Hello,
I have a couple of filebeats shipping data directly to elasticsearch, when this data hits elasticsearch two ingest pipelines are executed, one from the module, and another one using the final_pipeline
setting, I've created this final pipeline to make some transformations and enrich and avoid changing the original module pipeline, this works without any problem or performance impact.
Now I need to add some fields based on the value of another field, something similar to the translate
filter in logstash, looking at the documentation I saw that the way to do this in an ingest pipeline is using the enrich processor
.
The enrich seems pretty simple, it is something to emulate how winlogbeat set the event.*
fields.
Something like this:
{ "code": "4624", "category": "authentication", "type": "start", "action": "logged-in" }
{ "code": "4625", "category": "authentication", "type": "start", "action": "logon-failed" }
{ "code": "4634", "category": "authentication", "type": "end", "action": "logged-out" }
{ "code": "4647", "category": "authentication", "type": "end", "action": "logged-out" }
{ "code": "4648", "category": "authentication", "type": "start", "action": "logged-in-explicit" }
So I've created the index with the enrich data, the enricy policy and the enrich processor, everything worked almost as expected, but as soon as the enrich processor was enabled, the CPU Load on the nodes more than doubled.
The ingestion is done in 4 nodes with 10 vCPU, 64 GB of RAM, 30 GB of HEAP and SSD based disks, the average CPU Load is 6, 7, with the enrich processor enabled it goes up to 14, 15, sometimes even more. Theses nodes are also the hot data nodes in a hot/warm architecture.
This ingest pipeline receives an average of 2500 e/s.
Is there any way to improve the performance of the enrich processor? What would be the best approach in this case? This is only one of the enrichs that I need to do.
Since each enrich would be no more than 100 or maybe 200 lines, I thought of just forget the enrich processor and use a lot of set
processors with one simple if conditional coupled with dissect
processors to create multiple fields.
Anyone has a benchmark of how an ingest pipeline with hundreds of set
and dissect
processors will perform? Since they do not query any index I'm assuming that they are lighter than an equivalent enrich
processor.
While this is my only type of data ingested direct from filebeat to elasticsearch, I'm trying to avoid the work of migrate the ingest pipeline to a logstash pipeline and change the entire ingestion strategy on a couple of servers.