Hello @DavidTurner and @Christian_Dahlqvist
I was finally able to track the cause using the following request:
GET _tasks?nodes=node-name&actions=*write*&detailed
Then I wrote a small python script to parse the json response and get only the task description with the index name and the task running_time_in_nanos, this allowed me to easily see which was the index that was taking more time in the indices:data/write/bulk
action.
It was the only pipeline that sends data directly from filebeat to Elasticsearch, this pipeline uses a third party module with an ingest pipeline (wazuh) and we also created a final_pipeline
to do some extra processing, recently we added another ingest pipeline to do some enriching, this pipeline is called from the final_pipeline
using the pipeline processor
.
The enrich pipeline is composed of some set
processors, something around a hundred set processors, that have the following format:
{
"set": {
"field": "event.metadata",
"value": "authentication;start;logged-in",
"override": false,
"if": "ctx.event?.code == '4624'",
"ignore_failure": true
}
},
{
"set": {
"field": "event.metadata",
"value": "authentication;start;logon-failed",
"override": false,
"if": "ctx.event?.code == '4625'",
"ignore_failure": true
}
}
We created this enrich pipeline using set processors to replace an enrich processor we tried to use before and the performance was even worst, in that case the load on all hot nodes doubled as soon as the enrich processor was enabeld, I've made a topic about it if you want to read.
After I removed this pipeline with hundreds of set processors, the load of the node returned to a normal value, similar to the load of the other hot nodes.
What I do not understand now is why just one node had performance issues. Shouldn't the ingest be balanced between all the four nodes? Would ingest nodes only help solve this issue and it is possible to make just one pipeline use some specific ingest node?
Also, what are the improvements of ingests pipelines in newer versions? We have an update planned for next month, and I'm thinking if i should give another shot to do the enrich directly on Elasticsearch or move the ingesting to Logstash that can do what I need pretty quickly.
Thanks, you can close the topic.